A  Global  Convergence  Theory  for  General 
Trust-Region-Based  Algorithms  for 
Equality  Constrained  Optimization 

John  E.  Dennis 
Mahmoud  El-Alem 
Maria  Cristina  Maciel 

September  1992 
(revised  August  1995) 


TR92-28 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

AUG  1995  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1995  to  00-00-1995 

4.  TITLE  AND  SUBTITLE 

A  Global  Convergence  Theory  for  General  Trust-Region-based 

Algorithms  for  Equality  Constrained  Optimization 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computational  and  Applied  Mathematics  Department  ,Rice 

University, 6100  Main  Street  MS  134, Houston, TX, 77005-1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

34 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


A  GLOBAL  CONVERGENCE  THEORY  FOR  GENERAL 
TRUST-REGION-BASED  ALGORITHMS  FOR  EQUALITY 
CONSTRAINED  OPTIMIZATION  t 

J.  E.  DENNIS,  JR.  t,  MAHMOUD  EL-ALEM5,  AND  MARIA  C.  MACIEL51 

Abstract.  Tills  work  presents  a  global  convergence  theory  for  a  broad  class  of  trust-region 
algorithms  for  the  smooth  nonlinear  programming  problem  with  equality  constraints.  The  main 
result  generalizes  Powell’s  1975  result  for  unconstrained  trust-region  algorithms. 

The  trial  step  is  characterized  by  very  mild  conditions  on  its  normal  and  tangential  components. 
The  normal  component  need  not  be  computed  accurately.  The  theory  requires  a  quasi-normal  com¬ 
ponent  to  satisfy  a  fraction  of  Cauchy  decrease  condition  on  the  quadratic  model  of  the  linearized 
constraints.  The  tangential  component  then  must  satisfy  a  fraction  of  Cauchy  decrease  condition 
on  a  quadratic  model  of  the  Lagrangian  function  in  the  translated  tangent  space  of  the  constraints 
determined  by  the  quasi-normal  component.  The  Lagrange  multipliers  estimates  and  the  Hessian 
estimates  are  assumed  only  to  be  bounded. 

The  other  main  characteristic  of  this  class  of  algorithms  is  that  the  step  is  evaluated  by  using  the 
augmented  Lagrangian  as  a  merit  function  and  the  penalty  parameter  is  updated  using  the  El-Alem 
scheme.  The  properties  of  the  step  together  with  the  way  that  the  penalty  parameter  is  chosen  are 
sufficient  to  establish  global  convergence. 

As  an  example,  an  algorithm  is  presented  which  can  be  viewed  as  a  generalization  of  the  Steihaug- 
Toint  dogleg  algorithm  for  the  unconstrained  case.  It  is  based  on  a  quadratic  programming  algorithm 
that  uses  a  step  in  a  quasi-normal  direction  to  the  tangent  space  of  the  constraints  and  then  does 
feasible  conjugate  reduced- gradient  steps  to  solve  the  reduced  quadratic  program.  This  algorithm 
should  cope  quite  well  with  large  problems  for  which  effective  preconditioners  are  known. 

Key  Words:  Constrained  Optimization,  Global  Convergence,  Trust  Regions, 
Equality  Constrained,  Nonlinear  Programming,  Conjugate  Gradient,  Inexact  Newton 
Method. 

AMS  subject  classifications.  65K05,  49D37. 

1.  Introduction.  This  work  is  concerned  with  the  development  of  a  global  con¬ 
vergence  theory  for  a  broad  class  of  algorithms  for  the  equality  constrained  minimiza¬ 
tion  problem: 

(EOC)  =  I  minimize  /(*) 

—  \  subject  to  C(x)  =  0. 

The  functions  /  :  Rn  — >  5R  and  C  :  R"  — ►  5?m  are  at  least  twice  continuously  differen¬ 
tiable  where  C(x)  =  (ci  (x), ... ,  cm(x))T  and  m  <  n. 

Our  purpose  is  to  generalize  to  constrained  problems  a  powerful  theorem  given  in 
1975  by  Powell  for  unconstrained  problems. 

The  global  convergence  theory  that  we  establish  in  this  work  holds  for  a  class  of 
nonlinear  programming  algorithms  for  (EQC)  that  is  characterized  by  the  following 
features: 

1.  The  algorithms  of  the  family  use  the  trust-region  approach  as  a  globalization 
strategy. 


f  Research  was  supported  by  DOE  DE-FG005-86ER2501 7,  CRPC  CCR-91 20008,  AFOSR- 
F49620-9310212,  and  the  RED!  Foundation. 
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Computation,  Rice  University,  P.  O.  Box  1892,  Houston  TX  77251. 

®  Department  of  Mathematics,  Faculty  of  Science,  Alexandria  University,  Alexandria,  Egypt. 

11  Departamento  de  Matematica,  Universidad  National  del  Sur,  Avenida  Alem  1253,  8000  Bahia 
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2.  All  these  algorithms  generate  steps  that  satisfy  very  mild  conditions  on  the 
trial  steps’  normal  and  tangential  components.  It  is  important  to  note  that 
the  condition  is  not  required  on  the  truely  normal  component  of  the  trial 
step,  instead  it  is  on  the  quasi-normal  component  s",  which  is  allowed  to 
satisfy  the  relaxed  condition  that  ||s"||2  <  K\  ||C'(jtc)||2  for  some  independent 
constant  K\ .  The  conditions  are  that  the  quasi-normal  component  satisfies  a 
fraction  of  Cauchy  decrease  condition  on  the  quadratic  model  of  the  linearized 
constraints,  and  that  the  tangential  component  (as  measured  from  the  quasi¬ 
normal  component)  satisfies  a  fraction  of  Cauchy  decrease  on  the  quadratic 
model  of  the  reduced  Lagrangian  function  associated  with  (EQC). 

3.  The  estimates  of  the  Lagrange  multiplier  vector  and  the  Hessian  matrix  are 
assumed  only  to  be  bounded  uniformly  across  all  iterations. 

4.  The  other  main  characteristic  of  this  class  of  algorithms  is  that  the  step  is 
evaluated  for  acceptance  by  using  the  augmented  Lagrangian  function  with 
penalty  parameter  updated  by  the  scheme  proposed  by  El-Alem  [9], 

Conditions  1,  and  3  are  satisfied  by  the  algorithms  of  Byrd,  Schnabel,  and  Shultz 
[2],  Celis,  Dennis,  and  Tapia  [4],  Byrd  and  Omojokun  [21],  and  Powell  and  Yuan  [23]. 
Byrd,  Schnabel,  and  Shultz  and  Byrd  and  Omojokun  require  a  normal,  rather  than 
just  a  quasi-normal  s”,  in  2. 

We  use  the  following  notation:  the  sequence  of  points  generated  by  an  algorithm 
is  denoted  by  {a:*}.  This  work  also  uses  subscripts  -,  c  and  4-  to  denote  the  previous, 
the  current  and  the  next  iterates  respectively.  However,  when  we  need  to  work  with 
a  whole  sequence  we  will  use  the  index  k.  The  matrix  Hc  denotes  the  Hessian  of  the 
Lagrangian  at  the  current  iterate  or  an  approximation  to  it.  Subscripted  functions 
mean  the  function  is  evaluated  at  a  particular  point;  for  example,  fc  —  f(xc),  tc  = 
t(xc,  A0),  and  so  on.  Finally,  unless  otherwise  specified,  all  the  norms  will  be  f^-norms, 
and  we  will  use  the  same  symbol  0  to  denote  the  real  number  zero  and  the  zero  vector. 

The  rest  of  the  paper  is  organized  as  follows:  In  Section  2,  we  review  the  concept 
of  fraction  of  Cauchy  decrease.  In  Section  3,  we  review  the  SQP  algorithm.  In  Section 
4,  we  survey  existing  trust-region  algorithms  for  solving  problem  (EQC).  In  Section  5, 
we  present  a  general  trust-region  algorithm  with  the  conditions  that  the  trial  step  must 
satisfy.  In  Section  6  we  state  the  algorithm.  Sections  7  and  8  are  devoted  to  presenting 
the  global  convergence  theory  that  we  have  developed.  In  Section  7.1,  we  state  the 
assumptions  under  which  global  convergence  is  established.  In  Section  7.2,  we  discuss 
some  properties  of  the  trial  steps.  In  Section  7.3,  we  study  the  behavior  of  the  penalty 
parameter.  Section  8  is  devoted  to  presenting  our  main  global  convergence  result.  In 
Section  9,  we  present,  as  an  example,  an  algorithm  that  solves  problem  (EQC),  and 
we  prove  that  it  fits  the  assumptions  of  the  paper.  This  algorithm  was  one  we  had  in 
mind  as  motivation  for  the  convergence  thepry.  It  can  be  viewed  as  a  generalization 
to  constrained  case  of  the  Steihaug-Toint  dogleg  algorithm  for  the  unconstrained  case. 
This  algorithm  has  worked  quite  well  for  some  large  problems.  Finally,  we  make  some 
concluding  remarks  in  Section  10. 

2.  Fraction  of  Cauchy  decrease  condition.  Consider  the  following  uncon¬ 
strained  minimization  problem 

(UCMIN)  =  {  A*) 

(  subject  to  x  E  3tn, 

where  f  :  i R”  — ►  3?  is  a  continuously  differentiable  function.  A  trust-region  algorithm 
for  solving  the  above  problem  is  an  iterative  procedure  that  computes  a  trial  step  as 
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an  approximate  solution  to  the  following  trust-region  subproblem: 

ITRSi  =  /  mtnnnize  mc(s)  =  fc  +  V fj s  +  ±sTGcs 
1  ’-{subject  to  ||s||  <  6C, 

where  Gc  is  the  Hessian  matrix  V2/c  or  an  approximation  to  it  and  Sc  >  0  is  a  given 
trust-region  radius.  For  complete  survey  see  More  [18]  and  the  book  of  Dennis  and 
Schnabel  [7]. 

To  assure  global  convergence,  the  step  is  recpiired  only  to  satisfy  a  fraction  of 
Cauchy  decrease  condition.  This  means  that  sc  must  predict  via  the  quadratic  model 
function  rnc  at  least  as  much  as  a  fraction  of  the  decrease  given  by  the  Cauchy  step 
on  mc,  that  is,  there  exists  a  constant  cr  >  0  fixed  across  all  iterations,  such  that 

(2.1)  ?nc(0)  —  mc(.sc)  >  er[mc(0)  -  mc(s^p)], 

where  «cP  =  — fcPV/c  and  its  step  length 

if  vf raSfc  -  6‘  and  v/JGcV/c  >  0 

otherwise. 


1ZA1I 


VffO 


V/c 


Thus,  is  the  steepest  descent  step  for  mc  inside  the  trust  region. 

The  form  of  (2.1)  we  use  to  prove  convergence  is  given  in  the  following  technical 
lemma.  More  details  about  the  role  of  this  lemma  in  the  convergence  theory  of  trust- 
region  algorithms  can  be  found  in  Carter  [3],  More  [18],  Powell  [22],  and  Shultz, 
Schnabel  and  Byrd  [25], 

LEMMA  2.1.  If  the  trial  step  sc  satisfies  a  fraction  of  Cauchy  decrease  condition, 

then 

(2.2)  mc(0)  —  mc(.sc)  >  |||V/e||  min  j  Ejji,  6C  J  . 


Proof.  See  Powell  [22].  □ 

We  end  this  section  by  stating  Powell’s  powerful  theorem  for  unconstrained  trust- 
region  algorithms.  The  proof  can  be  found  in  Powell  [22].  More  details  about  the 
convergence  theory  for  trust-region  algorithms  for  unconstrained  optimization  can  be 
found  in  Fletcher  [14],  More  [18],  More  and  Sorensen  [19],  and  Sorensen  [26]. 

Theorem  2.2.  Let  f  :  3?”  — *  3?  be  continuously  differentiable  and  bounded  below 
on  the  level  set  (i  £  5C  :  /(x)  <  /(x o)}-  Assume  that  the  sequence  {G*}  is  uniformly 
bounded.  If  { x*}  is  the  sequence  generated  by  any  trust-region  algorithm  that  satisfies 
(2.1)  or  (2.2),  then: 


liminf  ][V/*|]  =  0. 

k  — ►  OG 

Notice  that  this  theorem  does  not  prove  convergence  to  a  solution  to  the  un¬ 
constrained  problem,  rather  it  proves  a  “weak”  first  order  convergence.  However, 
we  do  not  see  that  as  the  point  of  this  theorem,  nor  is  it  surprising  given  the  weak 
assumptions  on  the  sequence  of  local  models.  In  other  words,  this  theorem  is  not 
about  convergence  conditions  on  a  quasi-Newton  method.  Such  a  theorem  would  be 
expected  to  be  based  on  analyzing  some  way  of  estimating  the  Hessian,  and  we  all 
know  how  important  the  method  for  estimating  the  Hessian  is  in  the  practical  perfor¬ 
mance  of  a  trust-region  algorithm.  In  the  unconstrained  case,  the  version  of  Powell’s 
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theorem  that  says  that  the  sequence  of  gradients  converges  to  zero,  requires  the  ad¬ 
ditional  hypothesis  that  the  gradient  is  uniformly  continuous.  The  algorithms  here 
would  probably  require  a  uniformly  continuous  reduced  gradient,  a  strengthening  of 
the  assumptions  used  here.  The  related  algorithms  mentioned  earlier  also  prove  weak 
first  order  stationary  convergence,  as  do  we. 

The  point  of  this  line  of  research  is  an  analysis  of  the  local  quadratic-model/trust- 
region  paradigm  for  unconstrained  optimization.  In  that  context,  this  theorem  says 
that  the  power  of  using  a  trust-region  globalization  is  that  if  the  first  order  information 
is  correct,  then  little  is  required  of  the  second  order  information.  Specifically,  the 
sequence  of  model  Hessians  need  only  be  bounded. 

Our  theory  is  analogous  for  problem  (EQC).  In  this  case,  the  local  model  of  the 
problem  is  generally  taken  to  be  a  linear  model  of  the  constraints  and  a  quadratic 
model  of  the  Lagrangian.  The  information  in  the  local  model  depends  on  the  La¬ 
grange  multiplier  estimates  as  well  as  second  order  information.  In  this  paper,  we 
identify  a  way  to  extend  the  unconstrained  paradigm  to  problem  (EQC)  for  which 
the  only  requirement  is  boundedness  of  the  sequence  of  model  Lagrange  multipliers 
and  Hessians. 

The  above  discussion  summarizes  the  point  of  this  paper,  which  is  not  to  give  a 
convergence  proof  for  a  specific  SQP  approach  using  a  specific  Lagrange  multiplier 
estimation  technique  and  perhaps  an  exact  merit  function. 

3.  The  SQP  algorithm.  The  Lagrangian  function  (  :  0?"  x  5Rm  — ►  3?  associated 
with  problem  (EQC)  is  the  function 

=  /(*)  +  ATC(r), 

where  A  =  (Ai, ...,  Am)T  is  a  Lagrange  multiplier  vector  estimate. 

A  common  algorithm  for  solving  problem  (EQC)  is  the  successive  quadratic  pro¬ 
gramming  algorithm.  It  is  an  iterative  procedure.  At  each  iteration,  a  step  sQP  and 
associated  Lagrange  multiplier  A\®p  are  obtained  by  solving  the  following  quadratic 
program 

(QP)  =  /  minimize  fc(«)  =  l2sTScS  +  Vx^s  +  4 
\  subject  to  VCj s  +  Cc  =  0, 

where  the  matrix  Hc  is  the  Hessian  of  the  Lagrangian  at  (xc,  Ac)  or  an  approximation 
to  it. 

Unfortunately,  the  SQP  algorithm  can  not  be  guaranteed  to  work  without  modifi¬ 
cation.  There  is  a  fundamental  difficulty  in  the  definition  of  the  SQP  step  because  the 
second-order  sufficiency  condition  need  not  hold  at  each  iteration.  By  this  we  mean 
that,  the  matrix  Hc  need  not  be  positive  definite  on  the  null  space  of  VCj;  hence 
the  QP  subproblem  may  not  have  a  solution  or  a  unique  solution.  This  difficulty  will 
not  arise  near  a  solution  of  problem  (EQC)  if  the  standard  assumptions  for  Newton’s 
method  hold  at  the  solution.  For  this  reason,  the  SQP  algorithm  usually  performs 
very  well  locally.  See  Tapia  [28]  for  more  details. 

An  effective  modification  that  deals  with  the  lack  of  positive  definiteness  on  the 
null  space  is  to  use  a  trust-region  globalization  strategy.  This  takes  us  to  the  following 
section. 

4.  Existing  trust-region  algorithms  for  (EQC).  A  straightforward  way  to 
extend  the  trust-region  idea  to  problem  (EQC)  is  to  add  a  trust-region  constraint  to 
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the  (QP)  subproblem  to  restrict  the  size  of  the  step.  So,  at  each  iteration,  we  solve 
the  following  trust-region  subproblem: 

{minimize  qc(s)  =  \sT  Hcs  +  s  +  lc 
subject  to  VCj  s  +  Cc  —  0 

INI  <  5c 

However,  in  this  straightforward  approach,  observe  that  the  trust-region  constraint 
and  the  linearized  constraints  may  be  inconsistent,  and  thus  the  model  subproblem 
will  not  have  a  solution.  To  overcome  this  difficulty,  two  main  approaches  have  been 
introduced  for  dealing  with  the  case  when  {s  :  VCj.s  +  Cc  =  0}  n  {*  :  ||s||  <  6C }  =  0. 
They  are  the  tangent-space  approach,  and  the  full-space  approach.  We  describe  them 
briefly  in  the  next  section.  More  details  can  be  found  in  Maciel  [17].  See  also  Byrd, 
Schnabel  and  Shultz  [2],  Celis,  Dennis  and  Tapia  [4],  Omojokun  [21],  Powell  and  Yuan 
[23],  and  Vardi  [31]  and  [32]. 

4.1.  The  tangent-space  approach.  In  this  approach  the  trial  step  is  deter¬ 
mined  as  sc  —  s"  +  s'  where  s"  is  the  normal  component,  that  is  s"  is  inside  the 
trust  region  and  in  the  normal  direction  to  the  null-space  of  the  constraint  Jacobian, 
W(VCcT),  and  .s'  is  the  component  of  the  step  in  the  tangent  space  of  the  constraints 
given  by  .s'  =  Wcs‘ ,  with  .s'  €  and  Wc  is  an  n  x  (n  -  m)  matrix  whose  columns 

form  a  basis  for  JV’(VCJ). 

This  gives  two  questions  to  be  answered.  We  must  say  how  to  determine  s",  and 
given  s",  we  must  say  how  to  determine  .s'.  We  proceed  in  reverse  order.  Given  s", 
we  determine  .s'  by  considering  the  transformed  subproblem 

{minimize  qc(sl  +  s”) 
subject  to  VCjs'  =  0 

ll*‘ll  <5c, 

where  6C  =  y/6*  -  ||sj?||2.  We  choose  .s'  by  using  one  of  the  standard  unconstrained 
trust-region  trial-step  selection  methods  on  this  reduced  problem. 

These  algorithms  have  the  trust  region  capability  of  dealing  quite  well  with  zero 
or  negative  curvature  in  the  tangent  space  of  constraints.  Thus,  nonexistence  of  an 
SQP  step  at  the  current  iterate  is  readily  handled. 

To  choose  s”,  Byrd,  Schnabel  and  Shultz  [2]  and  Vardi  [31], [32]  suggest  relaxing 
the  linearized  constraints  by  replacing  Cc  by  nCc  where  a  £  (0, 1],  is  chosen  to  ensure 
that  the  above  trust-region  subproblem  is  feasible.  Thus,  s"  =  -qVCc(VCctVCc)->Cc 
Observe  that  if  a  —  0  then  V Cj s+aCc  =  0  contains  s  =  0  and  hence  for  any  a  £  (0, 1], 
there  is  some  a„  £  (0, 1)  for  which  {s  :  VCjs  +  a„Cc  =  0}  n  {s  :  ||.s||  <  <t6c}  ■£  0. 

The  drawback  of  the  above  approach  is  that  the  step  depends  on  the  parameter 
a,  which  it  is  not  clear  how  to  choose. 

Omojokun  [21],  used  this  approach  to  compute  a  trial  step  that  does  not  depend 
on  a  by  choosing  s"  to  be  the  step  that  solves  the  following  problem 

(  minimize  ||| VCj.s  +  Cc||2 
\  subject  to  ||.s||  <  <t6c 

for  0  <  er  <  1. 

It  might  appear  that  Omojokun  has  traded  the  choice  of  a  for  the  choice  of  er, 
but  in  fact,  <r  is  easy  to  choose.  Some  nominal  value  like  cr  =  0.8  is  used  throughout 
and  the  particular  value  of  cr  at  a  given  iteration  is  allowed  to  be  in  some  uniformly 
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bounded  strict  subinterval  like  (0.7, 0.9).  This  subinterval  corresponds  to  stopping 
criteria  on  a  trust-region  algorithm  to  solve  for  s".  See  More  [18],  More  and  Sorensen 
[19],  or  Dennis  and  Schnabel  [7]. 

4.2.  The  ftill-space  approach.  The  other  approach  to  overcoming  the  problem 
of  inconsistency  is  the  full-space  approach.  Algorithms  based  on  this  approach  com¬ 
pute  sc  at  once  in  the  whole  3?"  space  instead  of  considering  the  decomposition  of  the 
trial  step.  This  has  the  advantage  of  avoiding  the  computation  of  a  Moore-Penrose 
pseudoinverse  solution. 

The  first  example  we  know  of  this  category  of  trust-region  subproblems  is  the 
CDT  subproblem  proposed  by  Celis,  Dennis  and  Tapia  [4].  Instead  of  considering 
the  linearized  constraints  VCj s  +  Cc  =  0,  they  replace  it  by  a  particular  inequality: 
||VCj s  +  Cc||  <  Bc ,  where  Gc  G  3?.  The  CDT  subproblem  can  be  written  as  follows 

{minimize  qc(s) 

subject  to  ||VCjs  4-  Cc||  <  8C 

IMI<*c 

The  key  to  the  CDT  subproblem  (and  its  variants)  is  the  choice  of  8C.  For  more 
details,  see  Williamson  [33].  Celis,  Dennis,  and  Tapia  [4]  choose  8C  based  on  a,  fraction 
of  Cauchy  decrease  condition  on  ||VCjs-|-C0||2.  They  ask  the  step  to  satisfy,  for  some 
ri€(0,l], 

||Ce||2  -  ||Cc  +  VCjs||2  >  r,{||Ce||2  -  ||VCcTsf  +  Cell2}. 

This  can  be  done  by  choosing 

(4.1)  8]  =  (0*d)2  =  nHVCeV  +  Cell2  +  (l  -  rOUCell2 

where  «cP  solves  the  problem, 

(  minimize  j|| VCjs  4-  Cc||2 
<  subject  to  ||.s||  <  r<5c 
{  S  =  -tVCcCf,  t  >  0. 

Note  that  in  this  case  the  CDT  subproblem  minimizes  the  quadratic  model  of  i 
over  the  set  of  steps  inside  the  trust  region  that  gives  at  least  rt  times  as  much  decrease 
in  the  ^-norm  of  the  residual  of  the  linearized  constraints  as  does  the  Cauchy  step. 

In  order  to  prevent  the  possibility  of  a  single  point  for  the  subproblem  and  obtain 
a  meaningful  trust-region  subproblem,  it  is  suggested  that  r  <  1,  for  instance  r  =  0.8. 

5.  A  general  trust-region  algorithm.  In  this  section  we  describe  a  very  in¬ 
clusive  class  of  trust-region  algorithms. 

The  typical  form  of  trust-region  algorithms  for  solving  (EQC)  is  basically  as 
follows:  At  the  current  point  xc  with  associated  multiplier  estimate  Xc,  a  step  sc 
is  computed  by  solving  some  trust-region  subproblems,  and  a  Lagrange  multiplier 
estimate  A+  is  obtained  by  using  some  scheme.  The  point  x+,  where  x+  =  xc  +  sc, 
is  tested  using  some  merit  function  to  decide  whether  it  is  a  better  approximation 
to  a  solution  x*.  Such  merit  functions  often  involve  a  penalty  parameter,  which  is 
updated  using  some  scheme.  The  trust-region  radius  is  then  adjusted  and  a  new 
quadratic  model  is  formed. 

In  our  requirements  on  the  trust-region  algorithm,  the  way  of  computing  the 
trial  steps  is  replaced  by  some  conditions  the  steps  must  satisfy  and  the  estimates 
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of  the  Lagrange  multiplier  vectors  and  the  Hessian  matrices  need  only  be  uniformly 
bounded.  This  allows  the  inclusion  of  a  wide  variety  of  trust-region  algorithms  and  it 
is  exactly  in  the  spirit  of  Powell’s  Theorem  2.2  for  unconstrained  trust-region  methods 
In  Section  9,  we  will  present  an  example  algorithm  that  satisfies  these  mild  conditions. 

5.1.  Computing  the  trial  steps.  We  first  write  the  trial  step  assc=  s‘  +  *?. 
where  .s‘  and  s”  are  respectively  the  tangential  and  a  quasi-normal  component.  e 
do  not  require  that  s"  be  normal  to  the  tangent  space. 

We  will  require  that  the  components  s"  and  sc  satisfy  a  fraction  of  Cauc  y 
crease  condition  on  appropriate  model  functions.  At  the  current  iterate,  if  Cc  ±  0, 
then  we  will  require  that  the  quasi-normal  component  gives  at  least  as  much  decrease 
as  s=p  _  _n^p VCcCc  on  the  quadratic  model  of  the  linearized  constraints  in  a  trus 

region  of  radius  r6c ,  where  the  step  length  nc  is  given  by 


iivCcCdr 

l|vcTyCcC’c||!i 

L 


if  IggcCcll!  <  i 
11  \\VC?VC'C^  -  °c 

otherwise, 


where  6C  =  r6c  and  0  <  r  <  1.  In  words,  the  step  s?  is  chosen  from  the  set  of  steps 
that  satisfy  a  fraction  of  Cauchy  decrease  condition  on  the  quadratic  model  of  the 
linearized  constraints  inside  ||«||  <  6C.  Equivalently,  s?  lies  in  the  set 

Sc  =  {s  :  INI  <  6c}  n  {S  :  II VCjs  +  Cell2  <  (0frcd)2} 

where  (<9fcd)2  is  given  by  (4.1).  Because  the  quasi-normal  component  s)‘  is  not  required 
to  be  normal  to  the  tangent  space,  a  condition  on  the  step  is  needed  to  ensure  glo  a 
convergence.  In  particular,  the  following  condition  is  required 


(5.1) 


<  A'i||Cc||. 


where  AT  is  some  positive  constant  independent  of  the  iteration. 

If  sn  is  normal  to  the  tangent  space,  this  condition  holds  (see  Lemma  7.1)  as  long 
as  AT  is  greater  than  a  uniform  bound  on  the  norm  of  the  right  inverse  or  (x)  . 
When  sn  is  not  normal  to  the  tangent  space,  we  do  not  suggest  choosing  A!  an 
enforcing  (5.1).  Rather,  we  suggest  (as  in  Section  9)  that  (5.1)  is  enforced  naturally 
by  any  reasonable  algorithm  for  computing  a  linearly  feasible  point. 

We  will  deal  with  the  quasi- normal  components  of  the  trial  steps  assuming  a 
they  satisfy  (5.1).  We  are  indebted  to  Robert  Michael  Lewis  for  informing  us  of  the 
effectiveness  of  this  feature  in  the  algorithm  which  he  has  implemented  to  solve  a 
PDE  inverse  problem  [6].  Specifically,  this  allows  special  linear  algebra  developed  for 
simulation  constraints  to  be  used  in  place  of  prohibitively  large  least-squares  solutions. 

Now  we  use  the  quasi-normal  component  to  pick  a  linear  manifol  c  Para  e 
to  the  null-space  of  the  constraints  in  which  we  will  select  the  tangential  component. 
Let  Mc  =  {s  :  VCTs  =  VCcT.s?}.  Thus,  Mc  n  {.s  =  .s‘  +  s?  :  |M|  <  M  7^  »• 

Observe  that,  in  the  set  Sc,  we  are  taking  a  fraction  of  6C,  m  order  to  forestall 
the  case  that  Mc  lies  too  close  to  the  boundary  of  the  trust  region  of  radius  bc. 

On  the  manifold  Mc,  we  consider  a  quadratic  model  qc(s)  of  the  Lagrangian 
function  associated  with  problem  (EQC).  Then,  when  WcTV?c(s?) \  f  0,  we  ask^the 
tangential  component  to  satisfy  a  fraction  of  Cauchy  decrease  condition  from  sc  on 
qc(s)  reduced  to  Mc-  That  is  sc  =  s‘  +  s"  €  Qc  O  Afc,  where 

CJc  =  {s  =  s{+snc  :  INI  <  6c,qc(s)-M)  <  4W -VWCWTVM))-M)]}, 
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for  some  a  >  0,  and 


(5.2)  = 


where  Hc  =  Wj HCWC  is  the  reduced  Hessian  matrix  and  6C  is  the  maximum  length  of 
the  step  allowed  inside  the  set  Mc  n  {s  =  s*  +  s"  :  ||s||  <  <5C}  in  the  negative  reduced 
gradient  direction  —  W/Vg^s"). 

It  is  easy  to  see  that,  6C  satisfies 

(5-3)  (1  +  r)6c  >  St  >  (1  -r)6e. 

We  have  intentionally  not  stated  the  computation  of  the  tangential  component  as  a 
trust-region  subproblem.  Condition  5.2  is  a  lopsided  condition  in  the  sense  that  6C 
is  direction  dependent  because  the  quasi-normal  step  is  not  the  center  of  the  natural 
trust  region  for  the  reduced  quadratic.  A  better  step  might  come  from  minimizing  the 
reduced  quadratic  in  M.c  PI  {.s  =  s*  +  s"  :  ||.s||  <  6C},  and  an  ideal  step  would  probably 
come  from  minimizing  the  reduced  quadratic  in  Mc  D  {.s  =  s*  +  s”  :  ||s||  <  <5C}.  In 
any  case,  both  result  in  steps  that  satisfy  our  conditions. 

We  have  defined  the  tangent  space  Cauchy  step  along  —  W(7’Vgc(s"),  which  is  the 
steepest  descent  direction  for  gc(s"  +  Wcs{)  in  the  f2  norm.  The  steepest  descent 
direction  in  the  ||WC  •  ||  norm  would  be  -[Wc  Wj]-1  WcTVgc(s").  Of  course,  as  long  as 
{WcWj]~l  is  uniformly  bounded,  which  seems  a  reasonable  assumption,  then  either 
step  satisfies  a  fraction  of  Cauchy  decrease  condition  with  respect  to  the  other,  and  our 
theory  holds  for  either.  We  do  not  need  this  boundedness  assumption  for  our  choice  of 
Cauchy  step.  For  a  particular  application,  the  choice  of  variables  may  be  determined 
by  which  form  of  the  reduced  problem  is  easiest  to  precondition.  See  the  discussion 
after  Algorithm  9.2.  For  the  problems  of  interest  to  us,  -[Wc  WCT]-1  Wj  Vqc(s")  would 
be  an  extremely  expensive  -  or  impossible  -  direction  to  compute. 

5.2.  Updating  the  model  Lagrange  multiplier  and  the  model  Hessian. 

The  method  for  estimating  the  multiplier  Ac  is  left  unspecified.  We  only  require 
that  the  sequence  of  estimates  { A*  }  be  bounded.  Any  approximation  to  the  Lagrange 
multiplier  vector  that  produces  a  bounded  sequence  can  be  used.  For  example,  setting 
A*  to  a  fixed  vector  (or  even  the  zero  vector)  for  all  k  is  valid.  Similarly  we  require 
only  boundedness  of  the  sequence  {Hk}  of  approximate  Hessians.  Thus  all  Hk  =  0 
is  allowed.  Note  that,  here,  we  are  not  addressing  the  question  of  the  choice  of  the 
Lagrange  multiplier  and  Hessian  estimates  that  produce  an  efficient  algorithm.  We  are 
addressing  some  weak  assumptions  on  those  estimates  {At}  and  {Hk}  that  produce  a 
globally  convergent  algorithm.  For  example,  our  theory  applies  to  a  form  of  successive 
linear  programming. 

5.3.  The  choice  of  the  merit  function.  Let  xc  be  the  current  iterate.  We 
need  to  decide  if  a  trial  step  chosen  to  satisfy  s"  £  Sc  and  sc  =  s"  +s‘  £  QcnMc  is  a 
good  step,  that  is,  if  the  step  sc  gives  a  new  iterate  x+  that  is  a  better  approximation 
than  xc  to  a  solution,  say  a:*,  of  (EQC).  In  constrained  optimization,  the  meaning 
of  better  approximation  should  consider  improvement  not  only  in  /  but  also  in  the 
constraint  violation  | [C| |2-  The  evaluation  of  the  trial  step  requires  the  choice  of 
a  merit  function,  which  usually  involves  the  objective  function  and  the  constraint 
violations. 


Vqc(>?)TWcHcW?Vqc(‘?) 


\\wcw:!'vqc(»’c 


VqA'?)TWcHcW?V^(.2)  ^ 
and  V3c(.^)TWf5cWjVgc(s?)>0 

otherwise, 
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Here,  we  use  the  augmented  Lagrangian  as  a  merit  function 

(5.4)  C(x,  A;  p)  =  f(x)  +  XT  C{x)  +  pC(x)T  C(x),  p  >  0. 

This  function  has  been  used  as  a  merit  function  in  trust-region  algorithms  also  by 
Celis,  Dennis,  and  Tapia  [4],  El-Alem  [9],  [10]  and  Powell  and  Yuan  [23], 

El-Alem  [10]  and  Powell  and  Yuan  [23]  used  the  formula  A (x )  -  -(VC{x)TVC(x))~l 
VC(x)TVf(x)  for  updating  the  Lagrange  multiplier.  For  this  particular  choice  of  the 
multiplier,  A  is  a  function  of  x  and  (5.4)  is  an  exact  penalty  function.  This  means  that 
if  P  is  sufficiently  large,  then  the  solution  to  problem  (EQC)  will  be  an  unconstrained 
minimizer  of  the  penalty  function.  See  Fletcher  [12],  [13]. 

Celis,  Dennis,  and  Tapia  [4]  and  El-Alem  [9],  on  the  other  hand,  with  a  particular 
choice  of  the  multiplier,  have  treated  the  multiplier  as  an  independent  parameter 
that  really  only  enters  in  the  merit  function  for  accepting  the  step  and  updating  the 
other  parameters  in  the  algorithm.  In  other  words,  one  never  explicitly  uses  the  merit 
function  in  computing  the  optimization  step;  it  is  used  only  for  evaluating  the  steps. 
The  effect  on  the  trial  step  computation  of  the  multiplier  estimates  is  in  the  tangential 
component  through  the  estimate  of  the  Hessian  of  the  Lagrangian.  This  is  a  major 
difference  between  merit  function  roles  in  trust  region  algorithms  and  in  line-search 
algorithms. 

In  the  context  of  a  line-search  globalization  strategy,  Gill,  Murray,  Saunders, 
and  Wright  [15]  and  Schittkowski  [24]  have  considered  the  augmented  Lagrangian  as 
a  merit  function,  but  also  as  an  objective  function  for  choosing  the  step  along  the 
direction  of  search.  They  have  treated  the  multiplier  as  an  independent  variable  and 
proved  global  convergence  for  their  algorithms. 

In  summary,  we  believe  that  having  an  exact  penalty  function  as  a  merit  function 
is,  of  course,  a  desirable  property,  especially  in  line-search  algorithms.  On  the  other 
hand,  in  practice,  one  never  really  knows  anyway  that  the  penalty  constant  has  been 
chosen  so  that  the  exactness  property  holds.  In  [8],  [9]  global  convergence  for  a 
particular  trust-region  method  is  shown  with  no  assumption  of  exactness. 

In  this  work,  the  choice  of  the  multiplier  estimate  is  left  open  and  A  =  0  is  allowed, 
in  which  case  one  is  using  the  £ 2  penalty  function  as  a  merit  function. 

5.4.  Evaluating  the  trial  step.  Let  sc  be  a  trial  step  chosen  to  satisfy  the 
conditions  of  Section  5.1.  We  will  accept  it  if  sufficient  improvement  is  produced  in 
the  merit  function.  To  measure  this  improvement  we  compare  the  actual  reduction 
and  predicted  reduction  in  the  merit  function  from  the  current  iterate  xc  to  the  new 
one  x+  =  xc  +  sc.  The  actual  reduction  is  defined  by 

(0.0)  Aredc(sc; pc)  =  C(xc,  Ac;pc)  —  C(x+,  A+;  pc) 

=  t(xc,  Ac)  —  £(x+,  A+)  +  pc(||Cc||2  -  ||C+||2), 

and  the  predicted  reduction  is  defined  to  be 

(5-6)  Predc(sc-,pc)  =  C(xc,  \c\  pc)  —  Q(sc,  AAC;  pc) 

where  Q(.sc,  AAC;  pc)  =  £(xc,  Ac)  +  VJ(xc,  XC)T sc  +  Hcsc  +  (A\c)T(Cc  +  VCcTsc)  + 
Pc(||C'(,  +  VCj  -s  c  1 1 2 ) . 

We  will  accept  the  step  and  set  x+  =  xc  +  sc  if  >  rp  where  ip  £  (0, 1)  is  a 
fixed  constant.  A  typical  value  for  ip  might  be  10-4. 
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5.5,  Updating  the  trust-region  radius.  The  strategy  that  we  follow  for  up¬ 
dating  the  trust-region  radius  is  based  on  the  standard  rules  for  the  unconstrained 
case.  More  details  can  be  found  in  Dennis  and  Schnabel  [7]  or  Fletcher  [14].  How¬ 
ever  for  our  global  convergence  theory,  we  use  a  modification  due  to  Zhang,  Kim, 
and  Lasdon  [34]  (see  also  El  Hallabi  and  Tapia  [11])  of  the  strategy  of  updating  the 
trust-region  radius.  The  reader  will  see  that  this  modification  is  of  no  importance 
in  practice;  it  is  merely  an  analytic  formality.  At  the  beginning  we  set  constants 
<5max  >  '‘'min  and  each  time  we  find  an  acceptable  step,  we  start  the  next  iteration 
with  a  value  of  6+  >  <5mjn.  In  short,  6C  can  be  reduced  below  <5m;n  while  seeking  an 
acceptable  step,  but  <5+  >  <5mjn  must  hold  at  the  beginning  of  the  next  iteration  after 
finding  an  acceptable  step.  The  following  is  the  scheme  for  evaluating  the  step  and 
updating  the  trust-region  radius. 

Algorithm  5.1.  Evaluating  the  step  and  updating  the  trust-region  ra¬ 
dius 

Given  the  constants:  0  <  <*1  <  1,  a2  >  1  and  0  <  7/1  <  ij2  <  1  and  <5max  >  6C  > 
^min  >  0- 

While  <  Jh  (*  ^  m  =  w-4  *) 

Do  not  accept  the  step. 

Reduce  the  trust-region  radius:  8C  <—  Oi||.sc||  (*  e.g.  c*i  =  0.5  *),  and 

compute  a  new  trial  step  sc. 

End  while 

If  Vi  <  pilil  <  1)2  (*  e.g.  r/2  =  0.5  *)  then 

Accept  the  step:  x+  =  xc  +  sc. 

Set  the  trust-region  radius:  6+  =  max{<$c,  <5mjn}. 

End  if 

If  >  V2  then 

Accept  the  step:  r+  =  xc  4-  sc. 

Increase  the  trust-region  radius: 

(5.7)  <5+  =  min{^max,max{«5mjn,Q2<5c}} 

(*  e.g.  q2  =  2  *). 

End  if 

It  is  worth  noting  that  in  practice  one  might  have  another  branch  in  which  some 
i)i  €  (??i,r?2)  is  used  to  reduce  the  trust-region  radius  if  i]  1  <  A  typical 

value  for  7/ a  is  .1,  and  the  motivation  is  to  try  to  avoid  the  expense  of  a  next  unaccept¬ 
able  trial  step.  Another  modification  sometimes  used  in  practice  is  to  allow  internal 
doubling.  This  can  be  viewed  loosely  as  letting  a2  in  (5.7)  depend  on  75773*-.  See 
Dennis  and  Schnabel,  page  144,  [7].  The  present  analysis  would  allow  these  niceties, 
but  to  avoid  further  complication,  we  do  not  include  them  here.  Observe  that  in  (5.5) 
and  (5.6)  we  have  expressed  the  quantities  Ared  and  Pred  as  functions  of  p.  Thus, 
although  pc  does  not  effect  the  choice  of  the  trial  step  sc,  we  need  to  determine  pc  be¬ 
fore  deciding  the  acceptance  of  the  step  sc.  The  right  choice  of  the  penalty  parameter 
is  one  of  the  most  important  issues  for  algorithms  that  use  the  augmented  Lagrangian 
as  a  merit  function.  This  takes  us  to  the  following  section. 

5.6.  The  penalty  parameter.  Numerical  experience  with  nonlinear  program¬ 
ming  algorithms  that  use  the  augmented  Lagrangian  as  a  merit  function  has  shown 
that  good  performance  of  the  algorithm  depends  on  keeping  the  penalty  parameter 
as  small  as  possible.  See  Gill,  Murray,  Saunders  and  Wright  [16].  On  the  other  hand, 
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global  convergence  theories  developed  by  El-Alem  [8],  [9]  and  Powell  and  Yuan  [23], 
require  that  the  sequence  {pk  }  be  nondecreasing.  El-Alem  [8]  requires  that  p  be  cho¬ 
sen  so  that  the  predicted  decrease  in  the  merit  function  be  at  least  as  much  as  the 
decrease  in  || VCj’-s  +  Cc||2. 

We  consider,  as  an  update  formula  for  the  penalty  parameter,  El-Alem’s  scheme 
given  in  [9],  since  it  ensures  that  the  merit  function  is  predicted  to  decrease  at  each 
iteration  by  at  least  a  fraction  of  Cauchy  decrease  in  the  quadratic  model  of  the 
constraints.  This  indicates  compatibility  with  the  fraction  of  Cauchy  decrease  condi¬ 
tions  imposed  on  the  trial  steps.  In  addition,  good  performance  was  reported  when 
implementing  this  scheme.  See  Williamson  [33].  It  can  be  stated  as  follows: 

Algorithm  5.2.  Updating  the  penalty  parameter 

1.  Initialization 

Set  p_  i=l  and  choose  a  small  constant  0  >  0. 

2.  At  the  current  iterate  xc,  after  sc  has  been  chosen: 

Compute 

Predc(sc;p.)  =  qc(0)-qc(Sc)-A\J(Cc+VCjsc)+p.[\\Cc\\2-\\VCjsc+Cc\\2}. 

If  Predc(sc;p_)  >  ^[||Cc||2  -  \\VCjsc  +  Cc||2], 
then  set  pc  =  p_ , 
else  set  pc  —  pc  +  0,  where 

=  2[fc(ac)  -  ge(Q)  +  AXJ (Cc  +  VCjsc)} 

||cf  |p  -  \\VCTsc  +  Cc||2 

End  if 

The  initial  choice  of  the  penalty  parameter  p_i  is  arbitrary.  However,  it  should 
be  chosen  consistent  with  the  scale  of  the  problem.  Here,  we  take  p_,  =  1  for 
convenience. 

An  immediate  consequence  of  the  above  algorithm  is  that,  at  the  current  iteration, 
we  have 

(5-8)  Predc(sc;  pc)  >  ^[\\Cc\\2  -  ||Cc  +  VCcT.sc|]2]. 

5.7.  Termination  of  the  algorithm.  We  use  first  order  necessary  conditions 
for  problem  (EQC)  to  terminate  the  algorithm.  The  algorithm  is  terminated  if 
||WctVi4||  +  ||Cc||  <  £toi  where  etoi  >  0  is  a  pre-spec.ified  constant  and  Wc  is  a 
matrix  with  columns  forming  a  basis  for  the  null  space.  We  require  that  {W*}  be 
uniformly  bounded  in  norm  for  all  k. 

8.  Statement  of  the  algorithm.  We  present  a  formal  description  of  our  class 
of  nonlinear  programming  algorithms. 

Algorithm  6.1.  The  NLP-algorithm. 

step  0.  (. Initialization ) 

Given  xo,  Ao,  compute  Wo. 

Choose  Sq  f  ,  <5max»  and  £j0|  >  0. 

Set  p_  i=l  and  0  >  0. 

step  1.  (Test  for  convergence) 

If  \\W?Vxe(xc)\\  +  ||C(*c)||  <  etol 
then  terminate. 

End  if 
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step  2.  (Compute  a  trial  step) 

If  xc  is  feasible  then 

a)  find  a  step  s‘  that  satisfies  a  fraction  of  Cauchy  decrease  condition 
on  the  quadratic  model  qc(s)  of  the  Lagrangian  around  xc.  ( This 
might  be  done  by  solving  a  trust-region  subproblem  since  sn  =  0  is 
available.  See  Section  5.1) 

b)  Set  sc  =  s‘. 

else  (*  C(xc)  ±  0  *) 

a)  Compute  a  quasi-normal  step  s"  that  satisfies  a  fraction  of  Cauchy 
decrease  condition  on  the  square  norm  quadratic  model  of  the  lin¬ 
earized  constraints.  (See  Section  5.1) 

b)  If  WjVq(s”)  =  0 

then  set  s*c  =  0 

else  find  s*  that  satisfies  a  fraction  of  Cauchy  decrease  condition 
on  the  quadratic  model  qc(s "  +  s)  from  s" .  (Perhaps  not  by 
solving  a  specific  trust-region  subproblem.  See  Section  5.1) 

End  if 

c)  Set  sc  =  s”  +  s‘. 

End  if 

step  3.  (Update  \c) 

Choose  an  estimate  A+  of  the  Lagrange  multiplier  vector. 

Set  AXC  —  A+  -  Ac. 

step  4.  (Update  the  penalty  parameter ) 

Update  p_  to  obtain  pc  by  using  Algorithm  5.2. 
step  5.  (Evaluate  the  step) 

Compute 

Arede(sc;  pc)  =  t{xc,  Ac)  -  t(x+,  A+)  +  pc(||Cc||2  -  ||C+||2). 

Evaluate  the  step  and  update  the  trust-region  radius  by  using  Algo¬ 
rithm  5.1. 

If  the  step  is  accepted 

then  update  Hc  and  go  to  step  1. 
else 

go  to  step  2. 

End  if 

The  above  represents  a  typical  trust-region  algorithm  for  solving  problem  (EQC). 
We  leave  the  way  of  computing  the  trial  steps  undefined.  This  will  allow  the  inclusion 
of  a  wide  variety  of  trial  step  calculation  techniques.  For  similar  reasons  we  left  the 
way  of  updating  the  Lagrange  multiplier  vector  and  the  Hessian  matrix  undefined. 

In  the  next  two  sections  we  prove  global  convergence  of  the  above  algorithm  class. 

7.  The  global  convergence  theory.  Before  beginning  our  global  convergence 
theory,  let  us  give  an  overview  of  the  steps  that  comprise  this  theory. 

The  trial  step  is  chosen  to  satisfy  a  sufficient  predicted  decrease  condition,  the 
fraction  of  Cauchy  decrease.  Note  that  in  our  algorithm,  we  assume  that  the  tangential 
and  the  quasi-normal  components  of  any  trial  step  each  satisfy  this  condition.  In 
Lemma  7.2,  we  will  express  this  in  a  technical  form  similar  to  inequality  (2.2). 

The  definition  of  predicted  reduction  is  shown  to  give  an  approximation  to  the 
actual  reduction  that  is  accurate  to  within  the  square  of  the  trial  step  length  times 
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the  penalty  parameter.  This  is  proved  in  Lemma  7.5.  However,  we  emphasize  again 
that  the  step  is  not  chosen  to  maximize  the  predicted  decrease. 

We  introduce  some  notation  for  the  quantities  computed  during  the  trial  steps. 
We  have  not  introduced  this  notation  up  to  now  because  it  obscures  the  simplicity  of 
the  algorithm.  However,  in  the  analysis  that  follows  we  need  to  show  some  properties 
of  every  trial  step,  not  just  the  successful  steps  { sk } .  Therefore,  let  Sj.,  s[,  and  p'k 
denote  the  quantities  set  by  Algorithm  6.1  as  it  searches  for  an  acceptable  step.  Thus, 
at  the  first  trial  step  of  the  fcth  iteration,  sk  is  set  by  the  first  time  though 
step  2,  and  p°k  is  set  using  pk  1  =  i  the  first  time  through  step  4.  If  the  trial  step  sj, 
is  acceptable,  then  sk  =  s'k,  pk  —  p'k,  and  S'k  is  updated  to  become  <5*+i.  In  short,  the 
algorithm  is  simpler  to  explain  and  code  if  one  counts  only  successful  steps.  However, 
for  the  analysis,  one  needs  a  way  to  refer  unambiguously  to  all  the  trial  steps. 

The  model  Lagrange  multipliers  also  may  depend  on  i.  However,  to  keep  the 
notation  as  simple  as  possible,  we  do  not  make  this  dependence  explicit. 

The  penalty  parameters  p\  are  shown  to  be  bounded  for  e (ol  >  0  as  long  as  the 
algorithm  does  not  terminate.  The  technique  is  to  prove  that,  at  any  iteration  k 
at  which  the  penalty  parameter  is  increased,  we  have:  the  product  of  the  penalty 
parameter  p'k  and  the  trust-region  radius  Sj.  is  bounded  by  a  constant  that  does  not 
depend  on  k  or  i  (this  is  done  in  Lemma  7.10);  and  the  sequence  of  the  trust-region 
radii  Sj.  is  shown  to  be  bounded  away  from  zero  (this  is  shown  in  Lemma  7.11).  The 
proof  of  this  lemma  shows  the  crucial  role  that  is  played  by  setting  the  trust  region  to 
be  no  smaller  than  6mjn  after  every  acceptable  step.  See  Section  5.5.  Finally,  under 
the  assumption  that  the  algorithm  does  not  terminate,  the  penalty  parameter  pk  is 
shown  to  be  bounded.  The  proof  is  given  in  Lemma  7.12. 

The  algorithm  is  shown  to  be  well-defined  in  the  sense  that  at  a  given  iterate,  it 
either  terminates,  or  finds  an  acceptable  step  after  finitely  many  trials.  This  result 
is  proved  in  Theorem  8.1.  Using  the  above  results  and  Theorem  8.1,  the  trust-region 
radius  is  shown  to  be  bounded  away  from  zero.  The  proof  is  given  in  Lemma  8.2. 

Finally,  in  Theorem  8.4,  it  is  shown  that  for  any  e<0(  >  0,  the  algorithm  always 
terminates,  i.  e.,  the  termination  condition  of  the  algorithm  will  be  met  after  finitely 
many  iterations. 

7.1,  The  problem  assumptions.  We  start  by  stating  the  assumptions  under 
which  global  convergence  is  proved  for  Algorithm  6.1.  Assumptions  A1  -  A5  (see 
below)  are  used  by  Byrd,  Schnabel,  and  Shultz  [2],  El-Alem  [8],  [9],  [10]  and  Powell 
and  Yuan  [23]  and  their  particular  choices  of  Lagrange  multiplier  vectors  satisfy  A6. 

Let  the  sequence  of  iterates  {x*}  generated  by  the  algorithm  satisfy: 

Al.  For  all  k,  xk  and  xk  +  sj.  g  12,  where  12  is  a  convex  set  of  3?n. 

A2 .  /,CeC2(fi). 

A3.  rank(VC(x))  =  m  for  all  x  €  12. 

A4.  /(x),  V/(x),  V2/(x),  C(x),  VC(x),  (vq^fvq*))-1,  W(x),  and 
V2c;(x)  for  i  =  1,  •  •  •  ,  m  are  all  uniformly  bounded  in  12. 

A5.  The  matrices  Hk,k  =  1,2,..  are  uniformly  bounded. 

A6.  The  vectors  A k,k  =  1,2,..  are  uniformly  bounded. 

Assumption  A4  means  that  for  all  x  £  12,  there  exist  positive  constants  u,  uq,  V\ , 
^2,  v 3,  i/4,  v5,  and  such  that:  ||/(x)||  <  v,  ||V/(x)||  <  i/0,  ||C'(x)||  <  iq,  ||VC(x)||  < 
UZ'  ||(VC(x)T  VC(*))_1 1|  <  v 3,  ||V2/(x)||  <  i/4,  ||V2Cj(x)||  <  v5  V  i  =  1,  •  •  • ,  m,  and 

II^WII  <  **. 

An  immediate  consequence  of  Assumptions  A4  and  A5  is  the  existence  of  a  con¬ 
stant  >  0  that  does  not  depend  on  k  such  that  \\Hk  ||  <  u7,  \\W?Hk\\  <  u7,  and 
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\\WjHkWk\\  <  y7. 

Assumption  A6  means  that  for  all  x  €  12,  there  exists  a  constant  i/8  >  0  that  does 
not  depend  on  k,  such  that  ||A*||  <  yg. 

The  following  three  subsections  are  devoted  to  presenting  lemmas  needed  to  prove 
global  convergence. 

7.2.  Properties  of  the  trial  step.  The  following  lemma  shows  that  condition 
(5.1)  holds  for  the  normal  component  sjT  of  .sj.  when  it  is  truly  normal  to  the  tangent 
space. 

Lemma  7.1.  At  the  current  iterate  xk,  let  the  trial  step  component  s'kn  actually 
be  normal  to  the  tangent  space,  then  under  the  problem  assumptions ,  there  exists  a 
constant  K\  >  0  independent  of  the  iterates,  such  that 

(7-D  1141  <  ffi||c*||. 

Proof.  Because  sk  is  actually  normal  to  the  tangent  space,  we  have 

1141  =  ||vc*(vcj'vclk)-,vcj«i|| 

=  livc^vcfvc*)-1^*  +  VCfcTs*  -  C*)|| 

<  ||VQ(VCtTVCt)-I||[||Q  +  VCfcT.sl||  +  ||C*||]  . 

Now,  using  the  fact  that  ||Cfc  +  VCJ4II  <  ||C*||  ,  we  have 

1141  <  2  •  ||VC/t(VCjVCfc)-I||  •  1(0*11  . 

The  rest  follows  from  the  problem  assumptions.  □ 

The  following  lemma  expresses  in  a  workable  form  the  pair  of  fraction  of  Cauchy 
decrease  conditions  imposed  on  the  trial  steps. 

Lemma  7.2.  Let  the  trial  steps  satisfy  the  conditions  given  in  step  2  of  Algorithm 
6.1,  then  under  the  problem  assumptions  there  exist  positive  constants  K2,  K3,  and 
K\  independent  of  the  iterates  such  that 

(7-2)  ||C'Jt||2-||C4+VCtT.s*fc"||2  >  A2||Cfc||min{A3||C*||,r^}, 

and 

(7.3)  qk(s'kn)  -  ft(4)> 

l|WfVq*(4")l|min{—  S'k,K,\\W^Vqk(.s'kn)\\}. 

*  l/6 

Proof.  The  proof  is  an  application  of  Lemma  2.1  to  the  two  subproblems,  followed 
by  a  use  of  the  problem  assumptions  and  (5.3).  □ 

Now  we  deal  with  the  trial  steps  assuming  that  they  satisfy  inequalities  (7.2)  and 

(7.3) .  In  what  follows,  we  will  use  implicitly  that  =  VC^4- 

Lemma  7.3.  Under  the  problem  assumptions,  there  exists  a  constant  K5  >  0 
independent  of  the  iterates,  such  that 


(7.4) 


«*(«)- ®*(4")-  aa*t(c*  +  vcfcT4")  >  -a'5||c*||. 
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Proof.  Consider 

<7*(0)-?*(4n)  =  -  \(s\n)THksin 

>  -llv*4||  ||4"ll  -  !l*i”ll3 

=  -(I|v,4||  +  i||^||||4n||)ll4n||- 

Using  (o.l),  the  fact  that  |js^  ||  <C  ^maxi  A k  &ud  z\A^  are  hounded,  ||C7fc  1 1  < 

||C*||,  and  the  problem  assumptions,  we  have 

qk(0)-qk(s'kn)-A\kT(Ck+VCZs'k)  >  — Jf5||Cjt||, 

and  we  obtain  the  desired  result.  □ 

The  following  lemma  gives  an  upper  bound  on  the  difference  between  the  actual 
reduction  and  the  predicted  reduction. 

Lemma  7.4.  Under  the  problem  assumptions,  there  exist  positive  constants 
K$,  Kr  and  Kg,  independent  of  k ,  such  that 

(7.5)  |Are*ifc(4;pl)-Predfc(St;pi)|<  A'6||4||2  +  A74ll4l|3  +  A84||4  ||2||C*||. 
Proof.  The  proof  follows  directly  from  El-Alem  [9].  □ 

If  the  penalty  parameter  were  uniformly  bounded,  the  next  lemma  would  show 
that  the  predicted  reduction  provides  an  approximation  to  the  actual  merit  function’s 
reduction  that  is  accurate  to  within  the  square  of  the  step  length. 

LEMMA  7.5.  Under  the  problem  assumptions,  there  exists  a  constant  Kg  >  0  that 
does  not  depend  on  k,  such  that 

(7-6)  I Aredk(s'k\ -  Pred*(4;4) \  <  Kgp7k ||4 1|2. 


Proof.  The  proof  follows  directly  from  the  above  lemma  and  the  fact  that  ||4|| 
and  | IQ 1 1  are  bounded.  0 

7.3.  The  decrease  in  the  model.  This  section  deals  with  the  predicted  de¬ 
crease  in  the  merit  function  produced  by  the  trial  step.  We  start  with  a  lemma. 

LEMMA  7.6.  Let  s'k  be  generated  by  Algorithm  6.1.  Then  under  the  problem 
assumptions,  for  any  positive  p,  the  predicted  decrease  in  the  merit  function  satisfies 

Predk(sk;  p)  >  U  W?  Vg*(4")||  min{  KA\\  WtT Vqk  (4”)||  ,  —  6[  } 

(7-7)  -  K,\\Ck\\  4-  4IIQII2  -  ||VCJ4  +  Cfc||2], 

where  K$  is  as  in  Lemma  7.3. 

Proof.  We  have 

Predk(s'k; p)  =  g*(0)  -  qk(s[)  -  A\kT(Ck  +  VQf4) 

+4llQll2-||vcfcT4  +  Q||2] 

=  (<lk(skn)  -  qk(s'k)) 

+(«*(  o)  -®*(4"))  -  aa  kT(ck  +  vcJ4) 

+  4IIQII2-||vqt4  +  q||2]. 
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From  (7.3)  and  Lemma  7.3,  we  have 


Predk(sl,p)  >  ^||^TV?t(.sf)||min{A'4|j^TVg*(,in)||1  —6'k} 

-  K6\\Ck\\  +  p[\\Ck\\2  -  ||VCtr4  +  c„n 
Hence  the  result  is  established.  □ 

If  xk  is  feasible,  then  the  predicted  reduction  does  not  depend  on  pk ,  so  we  take 
pk  as  the  penalty  parameter  from  the  previous  iteration.  The  question  now  is  how 
near  to  feasibility  must  an  iterate  be  in  order  that  the  penalty  parameter  need  not  be 
increased.  The  answer  is  given  by  the  following  lemma. 

Lemma  7.  i .  Assume  that  the  algorithm  does  not  terminate  at  the  current  iterate. 
If\\Ck\\  <  a6'k  where  a  satisfies: 


(7.8)  a  <  min  / 

then,  for  any  positive  p, 


£toi 


£tol 


VZtol 


3<5rnax  3^7  Ai<5max  12ATs 


K^toi  1  —  r 
3<5max  '  ^6 


} 


Predk(s'k-p )  > 


(7.9) 


Jl^TVg*(si")|jmin{A'4||ITfcTVgife(4n)||  ,  —4} 

^  1^6 

+  p!l|Ciir-||VCfcT.si+Ct||2]. 


Proof.  If  the  algorithm  does  not  terminate  at  xk,  then  ||W*TVX4||  +  ||C*||  >  etoh 
and  since  ||C*||  <  a6'k  with  a  <  4^,  therefore,  ||Ct||  <  ^  and  the  reduced 
gradient  satisfies  ||IV*rVx4||  >  |£(o/.  Now, 

l|w*Tvft(4B)||  =  ||Hf(vx4  +  ^4")ll 

>  l|w^vx4||-||w^fffc4B|| 

2,2 

>  3£<o/  -  vrK\\\Ck\\  >  -stoi  -  V7K\a6k . 

But  since  a  <  - ,  it  follows  that 

II^V®ft(4n)ll  > 

From  Lemma  7.6,  we  have 


Predk(s'k;p)  >  ^||WJbTV<?Jfe(4")||  min{—  6'k  ,  A'4||W*rVg*(4n )||} 

-  -K'slJCfcll  +  p[||C*||2  -  ||VC^4  +  Cfcll2]- 


Since  ||W^TV9(s^n)||  >  i£<oi,  we  have 

Predk(s'k-p)  >  |||W^Vft(4B)||min{^^4  ,  JjT4||Tl^,V94(4B)||} 

cr  .  f  1  -r  Etol  K4  1 
-  K,aS-k  +  p[||C,||!  -  || VC,T 4  +  C4||!|. 
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Thus 


Predk(s'k;p)  >  j\\W^Vqk(s'kn)\\  min{—  6'k  ,  K<\\W?Vqk(s'kn )||} 


and  since 


+ 


Lm,,/— ,  2^1 1 
l  ^6  3omax  J 


12 


K,a6i  +p[||Ci||2-||VCir4+Cfc||2], 


a  < 


cretoi  .  /  KiStoi  1  — 

12*5  mml  3W  ’  "TT 


— }• 
Vs  J 


we  have 


Predt(.si;p)  >  j\\Wf V qk (s'kn )\\  min{A4  ||W*TVg*(sf )||  ,  — *;} 

+  P[||Ci!||2-||VCtT.s‘  +  C*||2]. 

This  completes  the  proof.  □ 

Inequality  (7.9)  with  p  =  p'k  ‘  guarantees  that  if  the  algorithm  does  not  terminate 
and  if  || C*  1 1  <  a6k,  then  the  penalty  parameter  at  the  current  trial  step  does  not  need 
to  be  increased  in  step  2  of  Algorithm  6.1.  This  is  equivalent  to  saying  that  the 
possible  increases  in  the  penalty  parameter  will  occur  only  when  1 1 C*  1 1  >  a6‘k. 

Lemma  7.8.  Given  etoi  >  0,  there  exists  A'io  >  0,  which  depends  on  £t0i,  but  not 
on  k  or  i,  such  that  at  any  trial  step  sj.  of  iteration  k  at  which  the  algorithm  does  not 
terminate  and  ||Cjt||  <  a6'k  where  a  is  as  in  Lemma  7.7,  the  following  inequality  holds 

(7-10)  Predk(s'k-,p'k)  >  KwS'k. 


Proof.  Since  the  algorithm  does  not  terminate  and  ||C*||  <  a6'k,  where  a  is  as  in 
(7.8),  then  from  (7.9)  and  using  a  similar  argument  as  in  Lemma  7.7,  we  can  write 


Predk(sk; p'k)  > 


<7£tol 

12 


min 


Kj^toi  \ 

3  j  - 


vztoi  .  fl-r 
——  min  < - 

12  [  Vg 


~  tol 

3<5max 


Defining 


ATo 


&£toi  .  f  1  r 
— —  min  <  - , 

12  (  i/6 


3<5max 


we  have  Predk(s'k;  p'k)  >  Kl06'k  and  this  is  the  desired  result.  □ 

In  the  next  section  we  will  discuss  the  role  of  the  penalty  parameter  in  the  global 
convergence  of  the  nonlinear  programming  algorithm. 

7.4.  The  behavior  of  the  penalty  parameter.  In  this  section  we  discuss  the 
behavior  of  the  penalty  parameter.  The  crucial  result  here  is  that  the  sequence  {<5£  }  of 
trust-region  radii  is  bounded  away  from  zero  at  those  iterations  for  which  the  penalty 
parameter  is  increase  at  some  trial  step.  This  will  allow  us  to  conclude  that  the 
sequence  {p'k  }  of  penalty  parameters  is  bounded. 

According  to  the  rule  for  updating  the  penalty  parameter,  we  use  the  penalty 
parameter  from  the  previous  trial  step  if  the  amount  of  predicted  decrease  with  the 
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old  penalty  parameter  is  at  least  a  fraction  of  the  decrease  in  the  quadratic  model  of 
the  linearized  constraints,  that  is,  if 

(7.11)  PredtK;^-')  >  y-[||Q  ||2  -  \\Ck  +  VCJ4 1|2], 

then  p\  -  pk~l.  Otherwise,  we  use  p\  =  pk'  +  0,  which  enforces  (5.8).  See  Section 
5.6. 

Lemma  7.9.  Let  {pj. }  be  the  sequence  of  penalty  parameters  generated,  by  the 
algorithm,  then 

1.  {p\}  forms  a  nondecreasing  sequence. 

2.  If  the  penalty  parameter  is  increased,  it  will  increase  by  at  least  0. 

3.  If  the  penalty  parameter  is  not  increased,  then  inequality  (7.11)  will  hold. 
Proof.  The  proof  is  straightforward.  □ 

Lemma  7.10.  Let  k,  i  be  any  pair  of  indices  such  that  p'k  is  increased  at  the  ith 
trial  step  of  the  kth  iteration.  If  the  algorithm  does  not  terminate  at  Xk,  then  there 
exists  Ku  >  0  which  depends  on  etoi  but  does  not  depend  on  k  or  i,  such  that  for 
every  j  >  i, 

(7-12)  p>ksl  <  A'n. 

Proof.  If  pk  is  increased  at  the  ith  trial  step  of  the  fcth  iteration,  then  it  is  updated 
by  the  rule 

.  _  Z[<lk(s'k)-qk(0)  +  A\l(Ck+VCTsi)} 
k  IIQ||2-||cfc  +  vcJ4||2 

Hence, 

y[||c*ll2  -  lie*  +  vcf4n||2]  =  b*(4)-«*(o)]  +  AA*T(cfc  +  vcf4”) 

+f[||c*||2-||c*  +  vcfcT4n||2] 

=  [?*(4)-«*(*in)] 

+  [qk(sikn)-qk(0)}+A\l(Ck  +  VCJ4") 

+  j[-2(vc*c*)T4n  -  ||vcf4"||2]. 

Applying  (7.2)  to  the  left-hand  side,  and  ( /  .3)  and  Lemma  7.3  to  the  right-hand  side, 
we  can  obtain  the  following: 

^~||Q||  min  {  r6‘k  ,  K3\\Ck  ||  } 

<  -fll^TV9fc(4n)l|min  |A'4||TT,TV«Zt(4n)||, 

+  Ks\\Ck\\  -  0(VCkCk)Tsikn  -  ^||VCJ4”||2 

<  K5\\Ck\\-0(VCkCk)Ts'kn 

<  A'5||Q||  +  /3||VC4|||Ct||||.s'n|| 

<  (k5  +  fl||vc*||  ||4”!I)I|c*H- 
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Then, 


p'k  min  Hi  ,  A'3||C*||}  <  A 5  +  fiu2Smax. 

Since  at  the  current  trial  step  the  penalty  parameter  increases,  then  from  Lemma  7.7 
we  have  ||Cfc ||  >  a6'k.  Hence 


and 


min  {r6'k  ,  K3aS'k  }  <  K5  +  fiu2Smax 


i  <  2  A' 5  +  23l/2  ^max 
Pk  k  —  min  |r  '  A'3q} 

Now,  if  j  >  i,  then  6Jk  <  S'k .  Assume  without  loss  of  generality  that  pj  =  ft'k , 
i.e.,  that  the  ith  trial  step  was  the  most  recent  increase  with  respect  to  j .  Then 
p>k6Jk  <  PkK'  and  defining 


K  _  2A5  +  2/jr'2<5max 
11  A'2  min  {r  ,  A'3q}  ’ 

we  obtain  the  desired  result.  □ 

The  following  lemma  gives  a  lower  bound  for  the  sequence  {<5ji }  for  those  iterates 
at  which  the  algorithm  does  not  terminate  and  the  penalty  parameter  is  increased. 
In  the  next  section,  we  will  be  able  to  do  away  with  the  assumption  that  the  penalty 
parameter  is  increased. 

LEMMA  7.11.  Let  the  penalty  parameter  be  increased  at  the  ith  trial  step  of  the  kth 
iteration.  Then  under  the  problem  assumptions,  if  the  algorithm  does  not  terminate, 
there  exists  6,  which  depends  on  etoi  but  does  not  depend  on  the  iterates,  such  that 


(7.13) 


>  6. 


Proof.  To  begin,  we  note  that  if  i  -  0,  i.e.  we  are  at  the  first  trial  step  of  iteration 
k,  then  by  Algorithm  5.1,  6k  can  not  have  gotten  smaller  than  <5m;n  during  the  course 
of  the  iteration.  Thus,  we  can  restrict  our  attention  to  the  case  where  i  >  1. 

Our  proof  will  consist  in  showing  the  existence  of  6  such  that  6k  >  6  whether  or  not 
sk  is  acceptable.  Remember  that  for  all  the  rejected  trial  steps  we  have  6k  +1  =“ilKII- 
We  consider  two  crises: 

i)  l|Cfc|!  >  at63k  for  all  ;  =  0, ...  ,i. 

ii)  ||Cfc||  >  a6k  does  not  hold  for  some  j  between  0  and  i. 

i)  Consider  the  case  where  the  constraint  violation  ||C*||  >  a6 Jk  for  all  j  =  0, •••,!. 
We  have  from  Lemma  7.5, 

\Aredk(s{;pi)  ~  Predk{sjk-,pik)\  <  K9p{  ||^ ||2. 

Now  since  ||C*||  >  a63k,  then  from  the  way  of  updating  p‘k  and  using  inequality  (7.2), 
we  have 

Predk(4;pi)  >  y [||C*||2  -  ||C*  +  VCj«i||l] 

>  ^YK2\\Ck\\mm{K3a,  r}6{. 
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Hence 

(714)  |  Aredk{4’,Pk)  ~  Predk{sj;  p1^  j 


< 


2A'9||4| 


Predk(s Jk;p{)  '  K2\\Ck\\  min{A3a,  r}  ‘ 

Since  all  the  steps  s{  for  j  =  0,  •••,»  —  1  are  rejected,  it  must  be  the  case  that 


(7.15) 


1  -  Vi  < 


Aredk(sjk;p’k)  ^ 
Predk(s{\pik ) 


So  from  (7.14)  and  (7.15),  we  have 


(7.16)  114,1  > 

ZA9 

Since  S’k  =  cki||4_1||i  ancl  since  ||C*jj  >  or^°,  it  follows  that 


(7.17)  6i  =  o1||*i-1||>ai 


(1  -  in)K2  min{aA3,  r) 


2Kq 


<■ 


Now,  according  to  the  ride  for  updating  the  trust-region  radius,  we  know  that  t>?  > 
<W  Then 


(7.18) 


^  (*t(l  -  ih)K2mm{aK3,  r}  f 

dk  >  - - - - —  aCmin  =  #12- 


ii)  If  ||Ct||  >  a63k  does  not  hold  for  all  j  —  0,  •  •  • ,  i,  then  there  exists  a  largest  index  /, 
0  <  /  <  i,  such  that  ||C*||  <  holds. 

If  i  =  l  +  1  then,  from  the  way  of  updating  the  trust-region  radius,  b'k  =  a\  ||s^||. 
On  the  other  hand,  if  i  £  l  +  1,  since  ||C*||  >  a6Jk ,  for  all  j  =  l  +  1,  •  •  • ,  i,  then  from 
(7.16)  we  have 


Now,  because  s'k  1  and  s‘k+l  are  rejected  trial  steps  and  using  ||Cfc||  >  aSl+l , 
write 


we  can 


(7.19) 


<5;  =  aj||s 


r*i 


^  _  (l  -  m)K2mm{aK3,  r} _ 

-  1  2a; - l|Cfc|1 

>  n,q  (1  -  m)K2mm{aK3,  r}gt+1 

2A'g  k 


>  a\a 


2  (1  -  7/i)A2min{aJr3,  r} 


2  A’q 


t-  .  f  2  (1  -  ?h)A'2min{aA'3,  r} , 
Ai3  =  min{aj,afa - - — 1 - — 1), 

2Kg  ’ 


So,  if  we  set 
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then  we  have 


(7.20)  S‘  >  A'13||4ll- 

Therefore,  using  the  above  inequality  and  Lemma  7.10, 


4 114 


~PkKl3~K13  14 ' 


From  (7.5)  we  have 

\Aredk(s‘k;plk)  -  Predfc(4;4)|  <  [A's  +  (K7  +  aKs)p[  ||4 1|]||4 
Therefore, 


(7.21)  \Aredk(s[;plk)- Predk(slk;plk)\  <  [A'6  +  (Kr  +  air8)A'14]||4 1|4- 

Also,  since  ||C*||  <  a6lk,  then  from  Lemma  7.8,  we  have 

(7-22)  Prerf*(4;/4)  >  A"io4- 

Using  (7.21),  (7.22)  and  the  fact  that  4  is  rejected,  we  obtain 

1  .  <  Aredk(slk;  plk)  ~  [K6  +  K7Kl4  +  aKg  A'h]!^  [| 

Predk(s[;  p‘k)  ~  A10 

Hence 

(7  oil  1 1 o'  II  _ (1  —  rji)A'io _ 

“  K6  +  AtA',4  +  aA8A14  ' 

Now,  using  (7.20)  and  (7.23),  we  obtain  the  bound 


4  >  K 13 


(l-Oi)^io 


Ag  4-  R7Ki4  +  aKgKu 


K 15 


Defining 


6  =  min{<5min,  A'12,  A15} 
we  obtain  the  desired  bound.  □ 

Now  we  can  show  that  the  nondecreasing  sequence  of  penalty  parameters  gener¬ 
ated  by  the  nonlinear  programming  Algorithm  6.1  is  bounded. 

Lemma  7.12.  Under  the  problem  assumptions,  if  the  algorithm  does  not  terminate 
then  there  is  some  p* ,  which  depends  on  eto\ ,  for  which 

(7.24)  lim  pk  =  p*  <  oo. 

fc  — OO 

Furthermore,  there  exists  some  index  kp  such  that  pk  =  p*  for  every  k  >  kp. 

Proof.  We  need  to  show  that  p'  >  p'k  for  all  pairs  k,  i.  Clearly,  it  suffices  to 
consider  the  sequence  pk  of  different  pk' s  where  the  double  index  k,i  means  that  the 
penalty  constant  was  increased  to  be  p\  at  the  ith  trial  step  of  the  Jkth  iteration. 
Thus,  there  may  be  no  terms  or  more  than  one  term  for  a  given  k.  Then  from  Lemma 
7.10  and  Lemma  7.11,  we  have 


i  ,  An  ^  An 

—  c 


Kx 
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Therefore  {pi t}  is  a  bounded  sequence,  and  since  it  is  nondecreasing,  there  exists 
p*  <  oo  such  that 


lim  pk  -  p*. 

k — *-oo 

Now  since  the  existence  of  p*  ensures  that  pk  is  bounded,  and  since  we  know  that 
when  it  is  increased  it  is  increased  by  at  least  /?,  there  must  be  at  most  finitely  many 
increases,  and  the  proof  is  complete.  0 

This  last  result  and  the  following  one  will  play  crucial  roles  in  the  proof  of  the 
global  convergence  of  Algorithm  6.1. 

Lemma  7.13.  Under  the  problem  assumptions,  if  the  algorithm  does  not  terminate 
then  the  augmented  Lagrangian  is  bounded  on  Q, 

Proof.  The  proof  is  immediate  from  the  boundedness  of  the  penalty  constant  and 
the  problem  assumptions.  □ 

8.  The  main  global  convergence  results.  This  section  is  devoted  to  present¬ 
ing  our  main  global  convergence  results.  We  start  with  the  finite  termination  theorem 
where  we  show  that  the  general  nonlinear  programming  algorithm  is  well-defined.  In 
Section  8.2,  we  will  present  more  properties  of  the  trust-region  radius  sequence  gen¬ 
erated  by  the  algorithm  under  the  assumption  that  it  does  not  terminate.  In  Section 
8.3,  we  prove  global  convergence  of  our  algorithm. 

8.1.  The  finite  termination  theorem.  The  following  lemma  shows  that  the 
nonlinear  programming  Algorithm  6.1  is  well-defined  in  the  sense  that  at  each  iteration 
we  can  find  an  acceptable  step  after  finite  number  of  trial  step  computations,  or 
equivalently,  trust-region  reductions.  This  will  allow  us  to  drop  the  consideration  of 
trial  steps,  and  only  consider  “successful  trial  steps,”  {«*}• 

THEOREM  8.1.  Under  the  problem  assumptions,  unless  some  iterate  Xk  satisfies 
the  termination  condition  of  Algorithm  6.1,  an  acceptable  step  from  x k  will  be  found 
after  finitely  many  trial  steps. 

Proof.  The  proof  follows  from  Theorem  5.1  of  El-Alem  [9],  □ 

LEMMA  8.2.  Under  the  problem  assumptions,  assume  that  the  algorithm  does  not 
terminate.  Then  there  exists  6+  >  0,  which  depends  on  etoi  but  does  not  depend  on 
the  iterates,  such  that  for  all  k,i, 

(8.1)  4  >  6 *. 

Proof.  The  proof  is  very  similar  to  the  proof  of  Lemma  7.11. 

To  begin,  we  note  that  if  the  first  trial  step  is  acceptable,  then  by  Algorithm  5.1, 
6k  can  not  have  gotten  smaller  than  6mjn  during  the  course  of  the  iteration.  Thus, 
we  can  restrict  our  attention  to  the  case  where  there  is  at  least  one  unsuccessful  trial 
step.  Let  us  assume  then  that  we  have  j  unsuccessful  steps.  Our  proof  will  consist 
in  showing  the  existence  of  6  such  that  63k  >  6  whether  or  not  4  is  acceptable,  i.e.,  is 
Sk  ■  Remember  that  for  all  the  rejected  trial  steps  we  have  63k  +l = «iii*in  <  si- 

We  consider  two  cases: 

i)  ||C*||  >  a6k  for  all  i  =  0, . . .  ,j. 

ii)  1 1 Cfc 1 1  >  a6'k  does  not  hold  for  some  i  such  that  0  <  i  <  j. 

The  proof  of  (i)  is  exactly  the  same  as  in  the  proof  of  Lemma  7.11,  so  let  us 
proceed  to  (ii). 

ii)  Now  if  ||Cfc||  >  a6k  does  not  hold  for  all  i  —  0, . . . ,  j.  As  in  Lemma  7.11,  we  let  / 
be  the  largest  index  such  that  )|Cfc ||  <  a6lk  holds.  Now,  since  1 1 C7* 1 1  <  a6'k  for  all  t  <  /, 


A  THEORY  FOR  GENERAL  TRUST-REGION-BASED  ALGORITHMS 


23 


it  follows  from  Lemma  7.8  that  for  all  such  i,  Predk(s'k;  p'k)  >  Furthermore, 

from  Lemma  7.5,  \Aredk(s'k;  pk)  —  Predk(s'k;  p'k)\  <  Kgpk  ||.sj.  ||2,  and  because  the  step 
s'k  is  an  unacceptable  step,  we  have 


i  -  m  < 


Aredk{.$l,  p[) 


-  1 


|  Predk(s'k- p'k) 

The  above  inequality  implies  that,  for  all  i  <  l, 


<  K9pj I14I13  ^ 


KioS'k 


4  >  114 II  > 


(1  —  i]i)Ki0 

K9p* 


For  all  i  >  l,  we  have  from  (7.20)  and  the  above  inequality, 

(1  —  f/i)A'io 


4  >  a'13||4II  >  a13- 


K9P* 


K 


10 


It  remains  only  to  collect  the  constants  as  in  Lemma  7.11.  □ 


8.2.  The  global  convergence  results.  Now  we  present  our  main  global  con¬ 
vergence  result.  Namely,  under  the  problem  assumptions,  the  general  nonlinear  pro¬ 
gramming  algorithm  generates  a  sequence  of  iterates  {arjt},  which  has  at  least  a  subse¬ 
quence  that  converges  to  a  stationary  point  of  problem  (EQC).  We  start  with  a  proof 
that  if  the  algorithm  does  not  terminate  it  will  converge  to  a  feasible  point. 

THEOREM  8.3.  Under  the  problem  assumptions ,  if  there  exists  £toi  >  0,  such  that 


\\W?VJk\\  +  \\Ck\\>etol 


for  all  k,  then 


(8.2) 


lim  110*11  =  0. 


Proof.  We  prove  (8.2)  by  contradiction.  We  begin  by  assuming  that  there  exists 
an  infinite  sequence  of  indices  {fcj }  such  that  |jO* ||  is  bounded  away  from  zero  for  all 
k  £  {kj}.  This  implies  that  there  exists  r  >  0  such  that  for  all  k  £  {kj},  1 1 O* 1 1  >  r. 
Now  for  each  kj  >  kp  where  kp  is  as  in  Lemma  7.12,  we  have  from  (5.8)  and  (7.2) 
that 

Predk,  >  ^-[||0*J||2  -  ||0*,  +  VO^.s*J|2] 

>  ^Y~\\Ck,\\rnin{K3\\Ckj\\,r6kj} 

>  —  inin{  A'3t,  rS *  }  =  A'i6  >  0. 

Remember  that  we  are  only  looking  at  successful  steps  at  this  point  in  the  analysis 
so, 

(8.3)  Ck>  -  Ckj  +  X  =  Aredkj  >  ipPredkj  >  >  0. 

Since  {Ck }  is  bounded  below,  a  contradiction  arises  if  we  let  kj  go  to  infinity.  □ 
Theorem  8.4.  Under  the  problem  assumptions,  given  any  £t0i  >  0,  the  algorithm 
terminates  because 


(8.4) 


I|w*tvx4||  +  ||o*||  <  £tol. 
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Proof.  Notice  that  if  we  suppose  that  the  algorithm  does  not  terminate  and 
that  some  subsequence  of  converges  to  zero,  then  nontermination  is 

immediately  contradicted  by  Theorem  8.3. 

So,  let  us  suppose  that  \\W?  V xlk  ||  >  n,  for  some  tx  >  0.  Since  1 1 C*  j  |  goes  to 
zero  by  Theorem  8.3  and  the  sequence  of  trust-region  radii  is  bounded  below  by  6+y 
there  exists  an  index  Ni  >  kp  such  that  for  all  Jfc  >  N\,  ||C*||  <  ad*  <  aSk ,  with  a  as 
in  (7.8).  Therefore,  by  Lemma  7.8  with  the  i  taken  so  that  sj.  =  Sk  was  the  successful 
step,  and  by  Lemma  8.2,  we  have  again  an  infinite  sequence  of  steps  in  which  the 
actual  decrease  in  C  is  at  least  7/iifio<5*.  This  contradicts  the  boundedness  of  C  and 
completes  the  proof.  □ 

9.  An  example  algorithm.  In  this  section  we  propose,  as  an  example,  a  par¬ 
ticular  step  choice  algorithm  for  step  2  of  Algorithm  6.1.  We  include  different  ways 
for  computing  s"  according  to  the  dimension  of  the  problem.  We  will  then  state  the 
complete  algorithm  for  finding  the  trial  step.  Finally,  in  Sections  9.5  and  9.6  we  will 
show  that  the  trial  step  generated  by  this  algorithm  satisfies  the  pair  of  fraction  of 
Cauchy  decrease  conditions  and  (5.1). 

The  step  choice  algorithm  we  propose  in  this  section  is  based  on  a  conjugate 
directions  method.  It  can  be  viewed  as  a  generalization  of  the  Steihaug-Toint  dogleg 
algorithm  for  the  unconstrained  problem.  This  algorithm  is  much  like  a  trust-region 
version  of  an  algorithm  due  to  Nash  [20]. 

9.1.  The  Steihaug-Toint  dogleg  algorithm.  This  section  is  devoted  to  de¬ 
scribing  the  generalized  dogleg  algorithm  introduced  by  Steihaug  [27]  and  Toint  [30], 
for  approximating  the  solution  of  problem  (TRS),  (see  Section  2).  This  algorithm  is 
based  on  the  linear  conjugate  gradient  method. 

Algorithm  9.1.  Steihaug-Toint  dogleg  algorithm  for  (TRS) 

Given  xc ,  6C ,  and  <  £  <  1 . 

step  0:  (Initialization) 

Set  «o  =0. 

Set  r0  -  — (Gcs0  +  V/c). 

Set  do  =  r0. 

Set  i  =  0. 

step  1:  Compute  7;  =  dj Gcdi. 

If  7;  >  0  then  go  to  step  2  . 

Otherwise  (*  d;  is  a  direction  of  negative  or  zero  curvature  *) 
compute  r  >  0  such  that  ||s;  +  rd,j|  =  6C. 

Set  sc  =  Si  +  rd{  and  terminate. 

step  2:  Compute  a ;  =  . 

Set  s,+x  =  Sj  +  otidi . 

If  ||si||  <  6C  go  to  step  3: 

Otherwise  (*  the  step  is  too  long,  take  the  dogleg  step  *) 
compute  t  >  0  such  that  ||s;  +  rrf;||  =  £c. 

Set  sc  =  Si  +  rd,  and  terminate. 

step  3:  Compute  r;+1  =  77  -  a;Gcrfi. 

If  <  tc  then 

set  sc  =  s;+1  and  terminate. 
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step  4:  Compute  &  — 

Set  di+ 1  =  ri+1  +  Qidi. 

Set  i  =  i  +  1  and  go  to  step  1: 

The  Steihaug-Toint  dogleg  algorithm  is  well-known  for  being  suitable  for  large- 
scale  unconstrained  problems.  It  can  be  used  in  the  framework  of  any  general  trust- 
region  algorithm  for  solving  problem  (UCMIN). 

9.2.  Computing  a  quasi-normal  component.  We  start  our  proposed  step 
choice  algorithm  by  finding  a  quasi-normal  component  s"  of  the  trial  step.  This  step 
must  satisfy  a  fraction  of  Cauchy  decrease  condition  on  the  constraint  norm  inside 
the  inner  trust  region.  It  determines  for  us  which  translate  of  the  null  space  of  the 
constraint  Jacobian  will  be  the  one  in  which  we  choose  the  next  iterate. 

We  repeat,  because  it  is  so  important,  that  we  do  not  require  that  s”  be  normal 
to  the  tangent  space,  just  that  it  satisfies  (5.1).  In  fact,  below  we  will  see  that  one 
way  we  might  choose  the  quasi-normal  component  by  finding  a  linearly  feasible  point 
and  just  scaling  it  back  onto  the  inner  trust  region. 

9.2.1.  Via  Craig’s  algorithm.  First  we  note  that  we  can  solve  for  a  linearly  fea¬ 
sible  point  by  using  Craig’s  algorithm  on  the  underdetermined  linear  system  VCj s  + 
Cc  —  0  (see  [5]).  Craig’s  algorithm  consists  of  making  the  transformation  s  =  VCcy 
and  applying  the  standard  conjugate  gradient  algorithm  to  the  following  mxm  linear 
system 

VCcT  VCcy  +  Cc  =  0. 

This  implies  that 

sfaig  =  .s™  =  -VCc(VCjVCc)-'Cc. 

Furthermore,  the  result  is  the  Moore-Penrose  pseudoinverse  constraint  normal  and  it 
requires  no  more  than  m  iterations.  Preconditioning  is  very  important  of  course,  but 
how  to  do  it  certainly  will  depend  on  the  particular  application. 

Therefore,  we  can  find  the  step  s"  by  a  Steihaug-Toint  version  of  Craig’s  algorithm 
in  the  inner  trust  region  of  radius  rSc.  In  this  algorithm,  iterates  will  be  generated 
until  we  find  the  desired  constraint  normal  s™n  such  that  ||s™n||  <  r6c  or  until  s^raig 
and  straddle  the  r6c  trust-region  boundary.  For  the  first  case,  we  set  s”  =  s™n. 
For  the  second  case,  we  choose  the  dogleg  step:  .Sc°8  E  [s^ralg,  n  {.s  :  ||s||  =  r6c} 

and  set  .s"  =  Sc°g. 

It  is  not  difficult  to  prove  that  each  Craig  iterate  is  the  projection  of  the  origin 
onto  the  subspace  of  the  tangent  space  spanned  by  the  steps  up  to  that  point  and  that 
each  {s‘ralg}  satisfies  (5.1).  Now,  the  Craig  steps  may  not  give  monotone  increasing 
£2  length,  so  a  more  agressive  strategy  that  works  perfectly  well  with  our  theory  is  to 
take  the  last  pair  of  Craig  iterates  that  straddle  the  trust-region  boundary.  In  either 
case,  by  convexity,  «c°g  also  satisfies  (5.1).  Furthermore,  it  is  clear  that  s"  =  Sc°g 
satisfies  the  fraction  of  Cauchy  decrease  condition  required  by  step  2  of  Algorithm 
6.1. 

9.2.2.  Via  a  linearly  feasible  point.  There  are  some  problems  for  which 
Craig’s  method  might  be  too  slow  and  too  hard  to  precondition  to  use  the  “inner 
Steihaug-Toint”  algorithm  given  above.  Or,  for  reasons  too  technical  to  be  of  much 
interest  here,  someone  might  prefer  to  do  an  implementation  that  computes  a  linearly 
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feasible  point  sjf  either  by  Craig’s  method  or  by  some  special  application  dependent 
methods.  The  point  of  this  subsection  is  that  when  this  is  the  case,  s"  can  be  taken 
to  be  the  projection  of  sjf  back  onto  the  inner  trust  region.  If  satisfies  (5.1),  then 
so  does  s”. 

Suppose  we  have  any  linearly  feasible  point  that  satisfies  (5.1).  Then,  if  it  is 
inside  the  inner  trust  region,  we  can  take  s”  to  be  that  point  and  it  clearly  satisfies 
the  fraction  of  Cauchy  decrease  condition  required  by  step  2  of  Algorithm  6.1.  If 
||sc  II  >  r6c,  then  we  take 


r6c 


A  classical  mathematical  programming  way  to  compute  a  linearly  feasible  point 
that  encompasses  some  special  purpose  methods  we  have  seen  for  some  inverse  prob¬ 
lems  is  as  follows.  In  some  way,  divide  s  into  so-called  basic  and  nonbasic  components. 
Let  us  assume  that  we  have  done  so,  and  using  column  pivoting,  we  write  VCT  as 
VCT  =  [J5|2V]  where  B  is  a  nonsingular  matrix  corresponding  to  the  basic  components 

-B~x  Nc 

In  —  m 


of  s.  This  corresponds  to  Wc  = 


Now  since 


VCj  s  =  Bcsb  +  Ncsn  =  —Cc, 


we  have 


sB  =  -B;l(Cc  +  Ncsn), 

and  then  if  we  choose  =  0  and  sB  =  —  B~lCc,  a  feasible  point  will  be 

=  (sb,sn)t  =  (-Bc~lCc,0)T. 

As  long  as  {\\B^  'll)  is  uniformly  bounded  by  some  constant  y,,  s| !  satisfies  (5.1) 
where  the  constant  here  is  7,.  This  is  a  standard  assumption  for  important  classes  of 
discretized  optimal  control  problems,  though  it  is  stronger  than  our  assumption  that 
[VC(zc)T VC(ic)]''  is  uniformly  bounded. 

9.3.  Computing  the  tangential  component.  We  now  assume  that  we  have 
the  quasi-normal  component  step  s“.  We  start  the  process  of  computing  the  tangent 
space  component  by  formatting  the  basis  matrix  Wc  £  %tnX(n~m\  The  columns  of 
Wc  form  a  basis  to  the  null  space  of  the  constraints  Af(VC'J). 

We  then  transfer  the  constrained  problem  into  an  unconstrained  trust-region  prob¬ 
lem  of  dimension  n  —  m,  in  the  following  form: 

f  minimize  |.stT Hcsl  +  Vgc(s”)T Wcs*  +  q(s") 

(  subject  to  ||Wrs*  -I-  s”||  <  6C, 

where  .s‘  6  Rn~’n ,  and  set  .s‘  =  Wcs‘.  The  step  4  is  the  component  in  the  tangent 
space  of  the  constraints  and  the  matrix  Hc  =  Wj HCWC  £  jj(»-™)x(n-m)  -g 
reduced  Hessian  matrix.  Now  we  use  the  Steihaug-Toint  algorithm  to  determine  s[ 
such  that  ||Wcs*  +  s"||  <  6C. 

The  complete  algorithm  for  finding  the  trial  step  is  presented  in  the  following 
section 
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9.4.  Conjugate  reduced  gradient  algorithm  for  EQC.  Here  we  write,  in 
more  detail,  the  example  algorithm  for  computing  a  trial  step. 

Algorithm  9.2.  The  CRG  step  choice  algorithm 
Given  xc  6  5Rn,  Sc  >  0,  and  6c  <  6  <  1. 

I.  FEASIBILITY: 

1)  Ifxc  is  feasible  go  to  II. 

2)  Determine  .s'* .  (*  Use ,  for  example,  s ”  =  .Se°g  or  s "  =  sjf  07lcj 

ikmi 

=  (-Bf-1Cc10)T.  *) 

II.  MINIMIZATION: 

(*  Find  ,sc  by  applying  the  CRG/Steihaug-Toint  algorithm,  to 

{minimize  qc(s ) 

subject  to  VCj (s  —  s")  =  0 

IMI  <  *c 

starting  from  s  =  s ” 
step  0:  (Initialization) 

Set  s0  =  s”. 

Set  r0  =  -Wj(Hcs”  +  VJe). 

Set  do  =  ro. 

Set  i  =  0. 

step  Is  Compute  7;  =  dj Hcd{. 

If  7i  >  0  then  go  to  step  2s, 

otherwise  (*  d,  is  a  direction  of  negative  or  zero  curvature  *) 
compute  t  >  0  such  that  ||.s,  +  rd;||  =  6C. 

Set  sc  =  §i  -f  rdi  and  terminate, 
step  2:  Compute  . 

Set  .s,  +  1  =  +  aid; . 

J/||*i||  <  6C  g°  to  step  3s, 

otherwise  (*  the  step  is  too  long,  take  the  dogleg  step  *) 
compute  t  >  0  such  that  ||.s;  +  7-rf;  j  j  =  6C. 

Set  sc  =  §i  +  rdi  and  terminate, 
step  3:  Compute  r;+i  =  r,  —  aiWj Hcd{. 

If  <  6c  then 

set  sc  =  .?i+1  and  terminate, 
step  4:  Compute  fii  =  . 

Set  di+\  —  v, _)- 1  -j-  ff{di . 

Set  i  =  i  +  1  and  go  to  step  1: 

It  is  worth  noting  here  that  this  way  of  computing  the  tangent  step  does  not  have 
the  property  that  once  a  step  goes  outside  the  trust  region  it  could  not  come  back 
in  were  the  eg  iteration  continued.  This  means  that  the  relaxed  SQP  step  might  lie 
inside  the  trust  region,  but  the  algorithm  above  might  not  return  this  more  desirable 
step  if  the  gradient  scale  and  trust-region  scale  are  inconsistent. 

It  would  be  better  otherwise,  of  course,  but  the  steps  given  here  will  lead  to 
convergence,  and  we  hope  that  near  the  solution  when  it  becomes  important  to  take 
SQP  steps,  the  trust  region  will  be  large  enough  to  compensate  for  the  difference  in 
shape.  If  the  implementer  wanted  to  be  more  agressive,  there  are  various  ways  that 
fit  our  theory  to  deal  with  this  situation.  For  example,  we  could  take  the  dogleg  step 
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based  on  the  last  time  the  c.g  iteration  leaves  the  trust  region  rather  than  the  first. 
Our  concern  here  is  to  prove  convergence  theorems  for  the  weakest  conditions  on  the 
algorithm,  and  to  show  that  reasonable  algorithms  satisfy  those  conditions,  not  to 
advocate  particular  implementation  details  of  no  consequence  to  the  theory. 

9.5.  Sufficient  decrease  by  the  steps.  In  this  section  we  show  that  the  con¬ 
jugate  reduced  gradient  algorithm  produces  steps  that  satisfy  the  conditions  we  im¬ 
pose  on  the  steps  in  step  2  of  Algorithm  6.1.  In  particular,  we  show  that  both  the 
quasi-normal  and  the  tangential  components  of  the  trial  steps  satisfy  their  respective 
fraction  of  Cauchy  decrease  conditions. 

The  following  Lemma  gives  a  bound  on  the  reducer  matrix  Wc.  The  proof  is 
straightforward,  so  we  will  omit  it. 

Lemma  9.3.  Under  the  problem  assumptions ,  if  there  is  a  uniform  bound  on  the 
matrix  B(x)*1 ,  then  the  reducer  matrix 


W(x) 


-B(x)~lN{x) 


is  bounded  for  all  x  £  SI. 

The  following  lemma  shows  that  the  quasi-normal  component  s" ,  satisfies  a  frac¬ 
tion  of  Cauchy  decrease  condition  on  the  quadratic  model  of  the  linearized  constraints. 

Lemma  9.4.  Let  sc  be  a  step  generated  by  Algorithm  9.2  at  the  current  iterate. 
Then  sc  satisfies  a  fraction  of  Cauchy  decrease  condition  on  the  quadratic  model  of 
the  linearized  constraints,  i.  e., 


(9.1)  ||Cc||2  -  ||Cc  +  VCjsc ||2  >  A-2||Cc||  min{r8c  ,  A3||Cc||}, 


where  K.2  and  K3  are  constants  independent  of  the  iterates. 

Proof.  Suppose  that  we  are  applying  Craig’s  algorithm  to  find  s".  Let  {sj,  s2, . .  •} 
be  the  sequence  of  iterates  generated  by  the  algorithm,  hence  for  all  i. 

Si  =  arg  mm{||VCj .s  +  Cc||,  s  £  span{pi , . . .  ,pt }}. 

Assume  that  ||s;||  <  r6c  and  ||s;+i||  >  r6c.  Therefore 

?dog  __  aS'  _|_  (1  _  a)Sj+,  with  a  G  [0, 1]. 


It  is  easy  to  see  that 

||VCjSi+Cc||<||VCj^P  +  Cc|| 

and 

||VCjs>+1+Cc||<||VCcT.sf  +  Cc||. 

By  convexity, 

II VCf +  Cc||  <  || VCjS'P  +  Cell. 

Thus, 

IICVII2  -  l|Cc  +  VCj II2  >  1 1 Cc | j 2  -  IJCe  +  VCj s^ll2. 


Thus  we  can  apply  Lemma  2.1. 
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Now  suppose  that  s”  is  given  by  s”  —  7csJ.f  with  -yc  —  -*4*-  when  ||s|.f||  >  r&c  and 

II*?  II 

7c  =  1  otherwise.  When  7f  =  1,  we  have 

ll^cll2  -  II VCjs?  +  Cell2  =  ||Cc||2  -  ||VCcT4f  +  Cell2  =  ||Cc||2. 

When  7c  <  1,  we  have 

||Cc||2-||Cc  +  VCcT<||2  =  ||Cc||2-||Cc  +  7cVCj^||2 

>  l|Cc||2-[(l-7c)  ||Cc||  +  7c  ||Cc  + vcjsjfll  ]2 

=  [  l-(l-7c)2  ]  ||Cc||2  >7c||Cc||2. 

The  desired  result  will  follow  from  the  definition  of  s[f  and  Lemma  9.3.  □ 

The  following  lemma  shows  that  the  null-space  component  .s' ,  satisfies  a  fraction 
of  Cauchy  decrease  condition  on  the  quadratic  model  of  the  Lagrangian. 

Lemma  9.5.  Let  sc  be  a  trial  step  generated  by  the  algorithm.  Then,  under  the 
problem  assumptions,  there  exists  a  positive  constant  Kit  which  does  not  depend  on 
xc  such  that 

qc(Sc)  -  qc(sc)  >  |||WcTVgc(5")||min|A'4||WcTV9c(S?)||, 

Proof.  Since  we  are  solving  the  reduced  problem 

f  minimize  js'Ti?c.s'  +  Vgc(s?)TWc.s'  +  q(s”  ) 

\  subject  to  ||Wc.s‘  +  .s" ||  <  6e, 

which  is  an  unconstrained  trust-region  subproblem,  the  proof  is  immediate  from  The¬ 
orem  2.5  of  Steihaug  [27]  followed  by  the  use  of  the  problem  assumptions  and  Lemma 
9.3.  □ 

We  state  the  following  lemma  here  for  completeness. 

Lemma  9.6.  The  quasi-normal  component  computed  by  our  proposed  step  choice 
algorithm  satisfies 


lkn||<A',||Cc||, 

where  K\  is  a  positive  constant  independent  of  c. 

Proof.  The  proof  is  given  with  the  discussion  of  how  to  compute  a  quasi-normal 
component.  See  Section  9.2.  □ 

10.  Discussion  and  concluding  remarks.  We  have  established  a  global  con¬ 
vergence  theory  for  a  broad  class  of  nonlinear  programming  algorithms  for  the  smooth 
problem  with  equality  constraints.  The  class  includes  algorithms  based  on  the  full- 
space  approach  and  the  tangent-space  approach.  The  family  is  characterized  by  gen¬ 
erating  steps  that  satisfy  very  mild  conditions  on  the  normal  and  tangential  compo¬ 
nents.  The  normal  component  satisfies  a  fraction  of  Cauchy  decrease  condition  on  the 
quadratic  model  of  the  linearized  constraints  and  the  tangential  component  satisfies  a 
fraction  of  Cauchy  decrease  condition  on  the  quadratic  model  of  the  Lagrangian  func¬ 
tion  associated  with  the  problem,  reduced  to  the  tangent  space  of  the  constraints.  Of 
course  the  step,  which  is  the  sum  of  these  components,  satisfies  both  conditions. 

The  augmented  Lagrangian  was  chosen  as  a  merit  function.  The  scheme  for  up¬ 
dating  the  penalty  parameter  is  the  one  proposed  by  El-Alem  [9]  since  it  predicts 
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that  the  merit  function  is  decreased  at  each  iteration  be  at  least  a  fraction  of  Cauchy 
decrease  on  the  quadratic  model  of  the  linearized  constraints.  This  indicates  compat¬ 
ibility  with  the  fraction  of  Cauchy  decrease  conditions  imposed  on  the  trial  steps. 

In  presenting  the  algorithm,  we  have  left  open  the  way  of  computing  the  trial 
steps  to  satisfy  the  double  fraction  of  Cauchy  decrease  condition.  This  will  allow  the 
inclusion  of  a  wide  variety  of  trial  step  calculation  techniques.  For  the  same  reason 
we  have  left  unspecified  the  way  of  approximating  the  Lagrange  multiplier  vector  and 
the  Hessian  matrix. 

With  respect  to  the  trial  steps,  we  have  suggested  an  algorithm  of  the  class  that 
should  work  quite  well  for  large  problems.  The  algorithm  is  a  generalization  of  the 
Steihaug-Toint  dogleg  algorithm  for  the  unconstrained  case.  This  algorithm  was  one 
we  had  in  mined  as  motivation  for  the  convergence  theory. 

The  least-squares  or  projection  formula  can  be  used  as  a  scheme  for  estimating 
the  multiplier  since  it  fits  the  condition  imposed  on  the  multiplier  updating  scheme. 
Namely,  under  the  standard  assumptions,  it  produces  bounded  multipliers  for  the  local 
models.  For  large  problems,  A  =  is  likely  to  be  a  much  preferable  formula 

because  of  the  cost  of  the  least-squares  solution.  Furthermore,  this  will  match  better 
with  the  reducer  matrix  W,  especially  for  problems  where  B  can  be  easily  identified. 
See  Dennis  and  Lewis  [6].  In  either  case,  the  uniform  boundedness  of  {A*}  follows 
from  the  problem  assumptions. 

The  exact  Hessian  matrix  perhaps  can  be  gotten  by  using  automatic  differentia¬ 
tion  or  an  adjoint  integration  approach.  See  Bisc.hof  et  al.  [1],  However,  an  approxi¬ 
mation  to  the  Hessian  of  the  Lagrangian  can  be  used.  Also,  for  example,  setting  Hk 
to  a  fixed  matrix  (e.  g.  Hk  =  0)  for  all  k  is  valid.  The  question  of  how  to  use  a  secant 
approximation  of  the  Hessian  of  the  Lagrangian  in  order  to  produce  a  more  efficient 
algorithm  is  a  research  topic.  We  believe  that  Tapia  [29]  will  be  of  considerable  value 
here. 

A  related  question  that  has  to  be  looked  at  is  the  search  for  preconditioners  to 
produce  more  efficient  algorithms.  We  believe  that  the  reducer  matrix  W  should  play 
a  role  in  that  search.  See  Dennis  and  Lewis  [6]. 

This  theory  is  developed  for  the  equality  constrained  case,  but  it  can  be  applied  to 
the  general  case,  by  one  of  the  strategies  known  as  EQP  and  IQP.  Here,  we  mean  that 
in  the  EQP  strategy  the  choice  of  the  active  set  is  made  outside  the  algorithm  that 
determines  the  step  while  in  the  IQP  strategy,  that  choice  is  made  inside  the  procedure 
that  determines  the  step.  Since  the  active  set  may  change  at  each  iteration,  the  choice 
of  the  submatrix  B ,  will  be  strongly  affected.  Certainly,  this  is  an  important  topic 
that  deserves  to  be  investigated. 
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